Keith Townsend, Principal, CTO Advisor

Network as Code Advanced Topic, Part 3 of 3

Network AutomationData Center
Keith Townsend, Principal, CTO Advisor
Screenshot of host, Keith Townsend, Principal, CTO Advisor, speaking into a microphone, and guest, Ned Bellavance, with a microphone in front of him

Advanced concepts of network as code

Are you looking for a master class on network as code? Watch the third video in this three-part series, which covers advanced concepts in network automation and is packed with information you can use to effectively scale your operations and redirect staff time to solve harder problems.

Show more

You’ll learn

  • Best practices for network testing and validation

  • How to utilize automated advanced monitoring

  • Key security policy and compliance considerations

Who is this for?

Network Professionals Security Professionals

Host

Keith Townsend Headshot
Keith Townsend
Principal, CTO Advisor

Guest speakers

Ned Bellavance Headshot
Ned Bellavance
Founder, Ned in the Cloud

Transcript

Introduction

0:02 foreign

0:08 you folks are Troopers you've stayed

0:11 through the first two videos in a series

0:14 where we introduced the general topic

0:17 network is code video two we talked

0:20 through implementing Network as codes

0:22 some of the just uh considerations for

0:26 when you're starting a project now we're

0:29 going to talk about some of the advanced

0:31 stuff some of the gatches some of the

0:33 things that you really need to consider

0:36 we spent quite a bit of time on this in

0:38 the last video Ned and by the way thank

0:41 you Juniper for sponsoring the series

0:44 but now we spent quite a bit of time

0:46 talking about this near in the video and

0:48 it's a very important topic Network

0:50 testing and validation what what is that

0:54 and what are we looking to achieve

0:57 right well I think there's multiple

0:59 levels of testing you can do when it

1:01 comes to deploying your network as code

1:03 and it all starts with the initial

1:05 check-in of that code too whatever your

1:07 Source Control process is typically

1:09 that's going to kick off some sort of

1:11 integration Pipeline and it's the job of

1:14 that pipeline to do some very basic

1:16 checking of what you've submitted is it

1:19 formatted properly is the code

1:21 syntactically valid and makes sense and

1:24 maybe you're even testing for some best

1:27 practices by using static code analysis

1:30 tools and we'll touch on specifics later

1:32 but the general idea behind a static

1:34 code analysis is it doesn't try to run

1:36 the code it's simply looking at the code

1:38 itself and it has some rules it's

1:41 basically a rules engine that says oh

1:43 you you're opening up you know Port 22

1:46 to the entire world through this

1:47 firewall Rule and that's we don't do

1:49 that so you know might flag that as as a

1:53 not good configuration so you're going

1:55 to have some sort of static analysis

1:57 tool in there and that's all about

1:59 vetting the code just to make sure

2:01 before it even runs that it looks good

2:04 so so Ned as you're talking through this

2:07 I can't think of a single product that

2:11 kind of does all of this and as I'm

2:14 streaming together the test process I

2:18 can't hold I can't help but think am I

2:23 introducing another point of failure

2:26 because now there's another human uh

2:30 uh

2:31 uh doing with their meat hands stringing

2:34 together these tests isn't that another

2:37 you know potential Breaking Point

2:39 of course you're having more complexity

2:42 that's that's what we do as Engineers

2:44 right we add complexity I would say the

2:47 benefit here and and the good news is

2:50 that when you're building your CI CD

2:52 pipelines you can do that declaratively

2:54 with code so it's not necessarily

2:58 somebody sitting there and stitching all

3:01 these pieces together by dragging boxes

3:03 around the screen or anything like that

3:05 you can develop workflows

3:08 and standardize those workflows and

3:11 publish them for other folks to consume

3:13 and use and actually a lot of the vendor

3:16 platforms have that uh process baked

3:19 into them so they already have these

3:22 predefined templates or these predefined

3:24 workflows that you can take advantage of

3:26 and then kind of snap in your own

3:29 tooling and tool sets so you still are

3:32 going to have humans involved because

3:33 you're always going to have humans

3:35 involved at least I hope otherwise we're

3:36 out of a job but those humans are going

3:39 to be doing things that forward your

3:43 automation goal and they'll be doing it

3:46 in code not by manually clicking around

3:48 in a UI so what if this caused this

3:51 testing this code what what's what's the

3:54 name is it's just is this all just part

3:56 of the sunset of infrastructure is cold

3:58 I think so so we're borrowing a little

4:01 bit from the concepts that are behind

4:03 testing software and so software has a

4:06 whole bunch of different test types you

4:08 have things like unit testing which is

4:10 just testing that say a function

4:12 Returns the values you expect and errors

4:15 in the way that you want it to so that's

4:17 you know at the very essential I've got

4:18 a unit and I'm testing it and then you

4:21 get into integration testing okay how

4:23 does that function work with the rest of

4:24 my program as a whole

4:27 not all of these Concepts apply one to

4:30 one or map one to one two infrastructure

4:32 and to networking because they are

4:34 different in the way that they work so

4:37 we can try to apply some of these

4:39 Concepts while recognizing that not

4:41 everything is a perfect mapping from

4:44 software development to managing

4:46 infrastructure

4:47 so talk to me about integration

4:50 testing like a a continuous motion when

4:54 I was building it wasn't quite a green

4:58 field but it was greenish we had a

5:00 chance to kind of reset a massive

5:03 Network for a Fortune 100 this this

5:05 opportunity doesn't happen often we

5:10 and we were bringing in the application

5:13 it was a mission critical application

5:15 was my job to manage that mission

5:17 critical application and they built all

5:20 of this network redundancy and I wanted

5:23 to say hey

5:25 let's take the opportunity to turn off

5:28 that switch like it's in theory like we

5:30 don't get to do this in in the real

5:32 world in production right doing tests I

5:34 can say hey let's actually test the

5:36 redundancy turn off the switch and let's

5:38 see if we can still do a transaction

5:42 that's Nirvana but how do we gradually

5:45 get there how do we kind of integrate

5:48 merge these pipelines sure yeah so I

5:51 mean the testing begins with when code

5:54 is checked in right that's that's the

5:55 beginning and we're going to do some

5:56 basic tests to make sure that your code

5:58 is good

5:59 but then the next step is how does that

6:01 integrate with the existing system and

6:03 that can be really difficult to test

6:05 because you don't necessarily want to

6:07 apply changes live to your network so

6:10 you could potentially have a development

6:12 instance of some of your network maybe

6:15 it's a virtualized version of your

6:17 network where you can deploy those

6:19 changes and then have a series of checks

6:22 that just validate hey the configuration

6:24 loaded properly on the switch it didn't

6:26 it didn't barf on any of the commands or

6:28 or the instructions of the configuration

6:30 that's in there it's a valid config that

6:32 will actually load on that switch even

6:35 though it's a virtualized version of

6:36 that switch and then there's another

6:38 aspect of integration which is how does

6:41 the network function as a whole once

6:44 you've deployed your updates and that

6:46 can be very very difficult to test in uh

6:49 in a non-production environment so there

6:53 are certainly tools out there that will

6:56 attempt to make a digital wins sort of

7:00 of your existing environments apply the

7:02 changes there and review the results but

7:05 ultimately you're always testing in

7:07 production right eventually it has to

7:09 hit

7:10 your production Network and you're

7:12 essentially testing there so what's

7:14 really really critical in this whole

7:16 process is having a complete feedback

7:18 loop to capture what's happening in the

7:21 production environment and have that

7:23 inform your development process for the

7:26 next iteration of your code

7:29 so let's move on to another topic

7:32 monitoring how does Network as code

7:35 impact monitoring and analyst analytics

7:39 seems like there's opportunities there

7:41 but what are some of the advanced things

7:43 we can start thinking about once we move

7:45 to network is code

7:47 well certainly assuming that you're

7:49 monitoring an analysis analytics

7:51 software packages and devices support it

7:53 you can deploy them with network as code

7:56 so it seems like we there's there's a

7:59 theme I'm developing here I'm in hearing

8:01 a friend of mine likes to call it

8:03 everything is code right if it if it has

8:05 an API endpoint and you can program

8:08 against this you should be defining its

8:11 configuration using Code as much as

8:13 possible and I think if there are any

8:14 sres watching you know exactly what I'm

8:17 talking about that's the the Nirvana of

8:19 the SRE is to automate all the things

8:21 that can be automated so you can move on

8:24 and do something else

8:25 uh so that's that's a portion of it is

8:27 just setting up that initial monitoring

8:29 and analytics it's something that some

8:31 people forget to do when they set up the

8:33 switch or they set up the router they

8:34 forget to turn on proper monitoring or

8:36 get it integrated with that monitoring

8:38 package that you have somewhere else oh

8:40 I set up a new Switch but I forgot to

8:42 send the ticket to the monitoring team

8:43 to add it to their list of network

8:45 devices that sort of thing when you have

8:48 that monitoring portion defined using

8:51 network as code you don't have to

8:53 remember because now you've created an

8:55 integration where a new network device

8:58 is added it automatically gets

9:00 integrated into your existing analytics

9:02 and monitoring packages so it's that

9:04 Dynamic Discovery and integration so

9:06 that's certainly a partial portion of it

9:08 but I think another important portion of

9:11 it is the ability to capture the impact

9:14 of your changes

9:15 and that kind of gets back to what we

9:17 were just talking about like I deployed

9:19 my code did I break anything

9:22 that's certainly important but and even

9:24 another and possibly equally important

9:26 part is

9:28 what were the actual impact of my

9:31 changes and did I achieve the goal of

9:34 those changes to begin with because we

9:36 don't just change the network for

9:38 funsies right it's not you know Friday

9:40 we're going to deploy some new network

9:42 as code and then go out and have happy

9:43 hour and like you have a business reason

9:46 or a technology reason to deploy changes

9:49 to the network

9:50 and so defining what the

9:54 point of the change is and then figuring

9:57 out how to measure the impact of that

9:59 change to make sure that the change you

10:01 made actually is reflected in the

10:04 performance that's the job of monitoring

10:06 analytics oh hey you made this update to

10:09 the network and now customer requests

10:11 are coming in 50 faster than they were

10:14 before because you streamlined something

10:16 in the network that's awesome you get to

10:18 report that back to your boss I improve

10:20 the network performance so that you're

10:23 getting more customer orders per second

10:25 fantastic

10:27 so as I think of you know the cicd

10:30 process and things that we wish we could

10:32 do but we didn't have the people or

10:34 processes to do it uh it was too

10:37 expensive to do or too burdensome to do

10:40 it every time you know we could create

10:43 uh CID CI CD processes or pipelines that

10:48 would kick off specific monitoring for a

10:51 specific amount of time off a set of of

10:55 ports let's say you know Port mirroring

10:58 generally speaking is expensive from a

11:01 uh is expensive from a resource

11:04 perspective but after a certain change

11:07 we want to always mirror a port for

11:09 let's say two hours so we collect that

11:12 data and if there's a there's another

11:14 trigger from the monitoring tool that

11:16 says hey if we reach this threshold

11:20 take this action and this action may not

11:22 be disruptive like making configuration

11:24 changes it could be monitor this other

11:27 thing uh that is you know kind of this

11:30 limited resource that we can now put on

11:33 to collect more data and make better

11:35 informed decisions

11:38 yeah absolutely and at this point I

11:40 won't say like CPU time is cheap but

11:42 it's a lot cheaper than it used to be

11:43 right storage isn't cheap but it's a lot

11:45 cheaper than it used to be so the

11:48 ability to capture

11:50 all of this information is certainly

11:52 there the other big challenge is then

11:55 okay I got all this additional info how

11:58 do I analyze it how do I munge useful

12:01 information out of it and so that's

12:03 that's not really a network is code

12:05 challenge but it's something that's

12:07 going to feed back into the loop of your

12:09 development of network as code is having

12:11 some sort of data analysis tool that can

12:15 give you useful insights into the

12:17 information that you're Gathering

12:20 so let's talk about our last

12:22 Topic in this series I think

12:25 one of uh if you're a networking person

12:27 you've dealt with both sides of this

12:30 implementing your security policy via

12:33 the network and then ensuring uh uh just

12:37 proving that to some internal or

12:40 external audience so let's talk about

12:43 implementing security policies through

12:46 code I talked to a bunch of folks about

12:49 security is cold that's a thing

12:53 how do we where do we start with our

12:56 security policies through code

12:59 sure so there's a whole bunch of

13:00 different policy engines out there that

13:03 will analyze code compare it to some set

13:06 of policies and then give you the

13:08 results one of the most popular ones

13:10 that I've been working with for a little

13:12 while is called open policy agent or

13:14 oppa and that has the capability to

13:17 analyze anything that that is expressed

13:20 in Json

13:21 and compare it to some rule sets that

13:24 you've defined and then give you results

13:26 based off those rule sets and what can

13:29 you express with Json well almost just

13:32 about anything so you know whether

13:34 that's uh doing analysis of static code

13:37 analysis so just what does the code look

13:39 like uh or it could be I have a planned

13:43 set of changes that I want to apply to

13:46 my network and I can look through the

13:48 plan set of changes and make

13:49 determinations of whether or not I find

13:52 that it's secure all the way up to

13:54 analyzing the actual running

13:56 configuration on network devices or

13:58 servers as long as it can be expressed

14:01 through Json oppa can take a look at

14:04 that and make some policy decisions say

14:07 oh well someone went into this switch

14:09 after the fact and altered something and

14:12 it's no longer in compliance and that

14:14 compliance can be defined usually

14:16 through the security and compliance

14:18 teams in your organization they set the

14:20 policies and then they allow you to test

14:22 for whether or not you're in compliance

14:24 with those policies

14:27 and one of the things that frustrated me

14:30 to know in when I did Network

14:32 Administration and operations day to day

14:35 in large organizations is when the

14:38 dreaded auditor comes in

14:41 and I think I wanted to talk about two

14:44 topics within this sure one

14:48 how do I answer

14:50 uh the requests from Auditors when I'm

14:53 living in an infrastructure is code and

14:56 networking code environment because I'm

14:58 not in my mind going back to the

15:00 individuals which is pulling configs

15:03 going to backups Etc to in answer the

15:08 requests from the Auditors and then the

15:12 second one is how do I make how do I

15:16 help the Auditors Trust

15:18 those

15:20 those artifacts I'm giving to them as

15:23 proof so let's do the first one first

15:25 like you know how am I pulling the

15:28 request the the the

15:30 a sample request is show me that uh uh

15:35 authentication is configured on every

15:36 Network device

15:38 sure yeah and I mean that's a request

15:40 pretty common request that comes in now

15:43 let's assume that you've defined in your

15:45 network as code authentication policies

15:48 for every single Network device

15:50 all you need to do is run a drift

15:55 detection essentially against all your

15:58 existing network devices and that gets

16:00 back to the get set and test that we

16:02 talked about in the previous video

16:03 you're just basically running the get

16:06 and test portions of that get me the

16:09 configuration from every network switch

16:10 test it against my defined configuration

16:13 is there a difference no there's not

16:16 awesome and hopefully the answer is no

16:18 there is not and so you can go to the

16:20 auditor and say Here's the you know the

16:23 run that I did against all my network

16:25 devices it found no differences and

16:27 here's the configuration that I've

16:28 defined in code that clearly has the

16:31 authentication policy enabled there you

16:33 go I'm done I don't have to tap every

16:35 single switch myself and pull the config

16:38 and dump it out into this giant you know

16:40 document that I deliver to them it's

16:43 here's the runner that I went through

16:45 that tested it against all the switches

16:47 and then here's the actual configuration

16:49 it was testing against you're good to go

16:53 so the smart auditor will come and say

16:56 well

16:58 there's a whole nother control plane

17:00 less authenticate that the folks making

17:05 the changes because we're no longer

17:07 making switch level changes we're not

17:09 going into the switch to configure

17:11 changes

17:12 this whole other team is doing this

17:14 platform team how do we ensure who has

17:17 rights to make changes if this

17:19 quote-unquote system is making changes

17:22 right I mean

17:24 because it depends on how you've secured

17:27 the workflow

17:28 so a fairly typical process is

17:31 everything goes through code you're

17:34 following sort of a git Ops process so

17:36 the way that I make changes in a system

17:38 is that I submit my changes via code to

17:42 the repository and that kicks off via a

17:45 web hook some CI CD Pipeline and in that

17:48 pipeline will be an approvals process

17:50 for the changes and so someone whether

17:54 it's an automated process or a manual

17:55 process needs to vet those changes

17:57 determine whether or not the changes

17:59 should be allowed and then approve those

18:01 changes and what you have in the

18:04 repository is a record of exactly who

18:06 committed the code and when they

18:07 committed it and in your pipeline you

18:09 have a record of exactly who approved

18:11 that code and when they approved it and

18:13 so you can trace the full change of your

18:16 environment through that entire process

18:18 now that doesn't mean that sometimes you

18:21 don't need to break glass in the case of

18:23 a a hard down situation where you need

18:26 to make immediate changes but that is is

18:29 hopefully an infrequent event and that

18:32 you have a well-defined process for

18:34 getting approval to break glass and make

18:36 changes

18:37 so this all starts with you can't

18:39 automate you can't code processes that

18:43 don't exist

18:45 well the the at the end of the day the

18:48 system is there all these systems that

18:51 we've talked about in this series those

18:55 systems are there to automate or codify

19:00 the things we've already written down on

19:03 paper the processes that we've already

19:06 talked about the operational

19:10 issues we've controlled for

19:12 I've saw I've seen cic CI CD processes

19:18 break entire systems because people

19:21 didn't sit down and write down their

19:25 existing processes and then build a CI

19:28 CD pipeline that supported their

19:31 existing pipeline uh pipelines they

19:34 tried to recreate their will and break

19:38 literally 30 years of integration test

19:42 processes Etc without really thinking

19:45 through and I think that that summarizes

19:48 the whole series what we're trying to do

19:51 is scale our operations in a way that uh

19:56 meets the bell of today the CTO

19:58 advisor's premise is that hybrid

20:01 infrastructure is here to stay we cannot

20:05 afford to have a bespoke approach to any

20:09 infrastructure whether that's network

20:11 storage compute or public Cloud we have

20:14 to have processes that scale take humans

20:18 out so we can put our people on smarter

20:20 and harder problems such as Network to

20:25 Cloud networking Cloud to Cloud

20:29 networking Cloud to Cloud security these

20:32 are problems we need to rededicate our

20:34 staffs to solving net any last comments

20:38 for our audience I think you hit the

20:41 nail on the head there it really is a

20:43 matter of automating existing processes

20:46 but most importantly you don't have to

20:50 twist the tool you don't have to twist

20:52 yourself out of shape to fit the tool

20:54 all these different tools that exist are

20:57 extensible and customizable and so you

21:00 should customize the workflow and select

21:02 the tool that meets the existing shape

21:04 and workflow of your organization

21:08 all right with that said you want to

21:10 find out more about the CTO advisor you

21:12 can follow us on the web the

21:13 ctoadvisor.com visit our friends Juniper

21:16 Network can folks find you if you're

21:19 looking for me the easiest way is to go

21:21 to my website Ned in the cloud.com all

21:24 of my links and other content are all

21:26 hosted there all right until then we'll

21:29 talk to you next video series

Show more