Julian Lucek, Senior Distinguished Systems Engineer, Juniper Networks

Juniper Networks Demonstrates Congestion Avoidance with Paragon Automation

Demo Drop Network AutomationWAN

Manual and semi-manual tasks take out a lot of time and bandwidth from a network engineer’s workday.

In this Tech Field Day Showcase, presenter Julian Lucek, discusses Juniper Networks’ Paragon Automation platform. The platform provides closed-loop automation leveraging AI, making sure that services are delivered right and on-time.

You’ll learn

The different software products included in the suite
How the pieces interact with each other synchronously within an API driven framework

Who is this for?

Network Professionals Security Professionals

Host

Julian Lucek

Senior Distinguished Systems Engineer, Juniper Networks

Guest speakers

Peter Welcher

Architect and Tech Advisor, NetCraftsmen

Nick Buraglio

Network Architect, ESnet

David Penaloza Seijas

Principal Engineer, Verizon Business

Steve Puluka

Network & Security Engineer

Resources

Transcript

0:09 hi everyone I'm Julian luchak I'm a

0:12 distinguished SC at Juniper Networks and

0:15 I'll talk in more detail about the

0:17 components of the Paragon automation

0:19 Suite so here are the different

0:22 components of the Paragon automation

0:24 Suite so looking from right to left

0:26 across this diagram first of all we have

0:29 Paragon Pathfinder which will play a

0:31 starring role in the demos that we're

0:34 going to show today and Paragon

0:36 Pathfinder has the ability to create

0:38 traffic engineered LSPs those can be SRT

0:41 or rsvpte LSPs and it can modify the

0:46 paths of those LSPs during their

0:48 lifetime according to observations made

0:51 from the live Network

0:54 and then we have Paragon insights that's

0:56 our health monitoring system it's

0:59 capable of identifying faults in network

1:01 elements and taking

1:03 um corresponding actions Paragon active

1:06 Assurance is the solution for active

1:10 probing across the network it can send

1:13 probes between pops or from Pops to

1:18 customer sites or from pop to cloud in

1:21 order to ascertain the performance

1:23 between different end points and that

1:26 can be in the form of parameters such as

1:30 delay delay variation and packet loss

1:34 ratios and then finally we have Paragon

1:36 planner which is an offline and planning

1:39 tool and that's capable of taking

1:42 snapshots of the live network from

1:45 Paragon Pathfinder and then with that

1:49 snapshots you can have a network model

1:52 on which you can do exhaustive earlier

1:54 simulations or capacity planning all of

1:57 these components are Cloud native

1:59 kubernetes based you can deploy them on

2:03 Prem you could deploy them in the public

2:05 cloud and also we recently announced

2:08 that we're going to have a SAS based

2:09 solution as well

2:12 what I'd like to do next is to show how

2:15 the different components can interact

2:17 with each other so at the bottom of this

2:19 slide we have the network itself and

2:22 Pathfinder can see the topology of the

2:26 network through the bgpls protocol so

2:28 that allows it to see the layouts of the

2:31 links and nodes and attributes of links

2:33 such as bandwidth srlgs and different

2:36 types of metric also Pathfinder can

2:40 create traffic engineered LSPs via the

2:43 pset protocol and modify them according

2:46 to observed conditions or user input now

2:50 Paragon insights is receiving streaming

2:52 Telemetry from the network in order that

2:54 it can ascertain the health of network

2:57 elements and for automatic remediation

3:02 it can send requests to Paragon

3:04 Pathfinder to create a maintenance on a

3:08 faulty Network element

3:10 Pathfinder can expose meshes of traffic

3:14 engineered nsps and some you can choose

3:16 to map a VPN to a particular flavor of

3:18 LSPs so for example if you have a VPN

3:21 that needs low latency service you can

3:23 map it onto the minimum latency mesh of

3:26 LSP Paragon active Assurance is sending

3:30 probes through the network in order to

3:32 ascertain the performance between

3:34 different endpoints as we heard and if

3:38 the performance

3:39 levels are violated then active

3:42 Assurance can send an alert to Paragon

3:44 insights so that it can take actions

3:46 accordingly

3:48 finally Paragon Pathfinder can trick can

3:52 create

3:53 snapshots of the live Network and those

3:57 can be passed on to

3:59 um Paragon planner and so that you can

4:01 perform capacity planning and simulation

4:04 in a network model that has been derived

4:08 from the live Network

4:11 for the te LSP creation mechanism

4:16 you've got psep down there is there

4:19 still a requirement for the netconf

4:22 configuration pieces for pushing LSPs at

4:27 one point when I looked at the previous

4:29 version of this for Juniper equipment it

4:33 didn't actually provision via psap it

4:35 still used netconf to do that

4:38 it's always

4:40 um it's always supported um both

4:42 actually so from the outset both have

4:43 been

4:44 um supported so netconf can be useful if

4:47 you've got Legacy devices that don't

4:49 support

4:50 um pseps so it's another it's an

4:52 alternative method that can be used but

4:54 in the main um people tend to use

4:57 um psep you know I like that a lot that

5:00 it supported both because in the

5:02 transition mode when we were deploying

5:04 this in a in a network being able to use

5:07 the net conf to really add psep as an

5:11 alternative control on a pre-configured

5:13 running LSP

5:16 um was a great option to transitioning

5:19 the the network over to a complete path

5:22 control on a on an existing uh large

5:27 Network because we all know there's no

5:29 such thing as a green fielded service

5:30 providers

5:32 yeah that's very true yes certainly the

5:34 other thing you can do is um if you have

5:36 pre-existing

5:38 um LSPs that have been created on the um

5:42 Ingress routers you know via CLI

5:45 um config

5:46 um you can actually delegate those to

5:49 um Pathfinder via psip and so

5:52 um you know if you turn on psep then

5:54 there's an extra line of config on an

5:56 LSP to delegate it and what delegation

5:59 means is that pset message gets sent by

6:02 the Ingress router to the controller

6:05 Pathfinder you know saying that

6:08 um this LSP has been

6:11 um delegated and from that point on

6:13 um Pathfinder can alter the path of the

6:16 LSP as needed in it during its lifetime

6:19 so that's an alternative

6:21 method of having psep you know running

6:25 with pre-existing nsps we found that was

6:28 a great feature to do that because you

6:31 when we first this up you have a you

6:34 know you have thousands of of

6:36 uh of services and LSPs out there and

6:40 just to be able to add that one

6:42 delegation line to existing

6:44 configuration with no service

6:46 Interruption whatsoever and then all of

6:48 a sudden you have the you know the

6:50 control the central control and the

6:52 rerouting possible

6:54 oh another thing is actually you can do

6:56 the delegation from the Pathfinder side

6:59 so if you wish you can have

7:01 um Pathfinder you know at that extra

7:03 line of config you know for you in order

7:05 to trigger that delegation yeah that's

7:08 exactly what we did we just had the push

7:11 the line push the line

7:13 the the talk of the LSPs there is is

7:16 there was a whole lot of uh word soup

7:19 that or acronym soup that happened there

7:23 um and for anyone that isn't totally

7:25 familiar with LSP provisioning in NSR

7:30 networks the the terminology and the

7:33 mechanism that we just described is

7:35 essentially

7:36 PCC initiated pce controlled right

7:40 and I think it really shouldn't be

7:43 understated like Steve said that the

7:45 fact you can if you have an existing

7:47 mechanism for provisioning LSPs command

7:51 line or whatever it is right you already

7:53 have an existing system you can still

7:56 use that while you Transit transition

7:58 over to using a

8:02 um you know segment routing controller

8:04 pce

8:05 and then just slowly transition each of

8:08 the LSPs over it's not a boil the ocean

8:11 uh type scenario for a large Network

8:14 that has you know maybe hundreds or

8:16 thousands of LSPs already that they

8:19 don't need to do a

8:20 you know a Flag Day essentially you can

8:23 just delegate those existing LSPs and

8:25 continue to use your old mechanism until

8:27 you move over to the new one absolutely

8:30 yes I mean one way of doing that is you

8:33 could do it

8:34 um Ingress routes to buy Ingress routes

8:36 you could say on day one

8:38 um the LSPs of which this router is the

8:40 Ingress those are the ones I'm going to

8:42 delegate now and then subsequently you

8:45 know another Edge routes which is the

8:48 Ingress of some LSPs you know those can

8:50 be delegated and so on so that's one

8:52 method of doing it in a stage by stage

8:55 way

8:57 yeah that's that shouldn't be that

8:59 shouldn't be discounted that's a that's

9:01 a pretty

9:02 powerful feature set yeah well and you

9:06 even have the flexibility of using

9:08 Paragon as the

9:11 controller without actually provisioning

9:13 anything from it you know you can just

9:16 use your existing provisioning system in

9:19 in all its uh glory and well-knownness

9:23 uh to continue to push things out there

9:26 and and still have the LSPs controlled

9:29 by Paragon even though it created none

9:31 of them

9:32 yeah that's that's pretty much the

9:35 migration strategy that I've seen

9:37 every time

9:39 so next we're going to see a demo of

9:41 automated congestion avoidance so we're

9:44 looking at the same network as before

9:46 but this time I'm now showing the

9:48 percentage utilization on each link this

9:52 information is derived from the

9:54 streaming Telemetry that the routers are

9:56 sending to the Paragon automation system

10:00 and you'll note that the links have got

10:03 varying amounts of traffic on them and

10:06 in particular this one which I'd like to

10:08 highlight the link between Amsterdam and

10:10 Hamburg in the direction from Amsterdam

10:12 to Hamburg has

10:14 um about 76 percent

10:17 um traffic loading at um the moment so

10:21 far I haven't turned the automated

10:22 congestion avoidance on because what I

10:25 want to do is to turn it on in a few

10:28 minutes so that we can see the

10:29 difference before and after let's have a

10:32 look at the paths of some LSPs that are

10:35 passing through that busy link so the

10:38 LSP from Amsterdam to Prague is passing

10:41 through that link as you can see

10:43 as is the one from Amsterdam to Berlin

10:47 and so now I'm going to do is to

10:49 actually turn on the congestion

10:51 avoidance so I'm going to go into one of

10:54 the menus

11:00 and so now I'm going into a settings

11:01 menu normally when we'd have this turned

11:04 on permanently but we want to see the

11:07 difference before and after so I'm going

11:08 to set a threshold here and

11:13 we are going to then submit that

11:17 and so this is a threshold

11:19 um above which we wish

11:22 um the Pathfinder to move um some of the

11:25 LSPs in order to bring the link below

11:27 the and threshold again we here have

11:31 applied it on a global basis but you can

11:33 also apply it in a more granular basis

11:35 with a different threshold on each link

11:37 if you wish so while we're waiting for

11:40 that to kick in we'll see with the aid

11:43 of some slides how the system

11:45 um actually deals with the congestion

11:48 so here's a diagram I'm explaining how

11:52 the scheme works and of course if a link

11:56 gets congested to the extent that it's

11:58 actually dropping packets then

12:01 um clearly the customer's applications

12:03 are

12:04 um going to suffer so that's one of the

12:06 motivations for having this congestion

12:09 avoidance the other motivation is and

12:11 from the capex point of view if traffic

12:14 is spread efficiently around the network

12:16 in order to use the available nodes and

12:19 links then that delays somewhat the

12:23 point in time at which you need to

12:24 upgrade some of the links in the network

12:28 in the face of increasing traffic over

12:31 the course of the weeks and months

12:33 so let's see how it works and so

12:37 um Paragon

12:39 Pathfinder is receiving streaming

12:42 Telemetry relating to traffic and that

12:46 Telemetry is of um two different types

12:50 first of all the routers are reporting

12:52 how much traffic is traveling on each

12:55 physical link

12:58 and then the Ingress routers of traffic

13:00 engineered LSPs are reporting how much

13:04 traffic is entering each of the traffic

13:06 engineered LSPs of which it's the

13:09 Ingress because of course traffic can

13:11 only enter a traffic engineered LSP at

13:13 the Ingress router and those traffic

13:17 engineered LSPs could be RSVP LSPs or

13:20 they could be srte nsps and so in this

13:24 example Network R1 is reporting how much

13:27 traffic is entering the purple LSP and

13:30 the blue LSP and

13:32 um R4 is reporting how much traffic is

13:35 entering the green LSP

13:39 now let's suppose that um the link

13:42 between R2 and R3 has reached the

13:44 congestion threshold that we have um set

13:48 um Pathfinder can see that through the

13:50 streaming telemetry

13:52 also it knows which LSPs are passing

13:54 through that link that's reached the

13:57 congestion thresholds that we set and

14:01 furthermore it knows through the

14:03 streaming Telemetry how much traffic is

14:04 traveling on each of those nsps and so

14:07 Pathfinder has all of the information it

14:10 needs in order to work out which LSPs to

14:13 move away in order to ease the um

14:16 congestion

14:18 and of course in so doing it needs to

14:20 make sure not to cause congestion

14:22 elsewhere so it takes that into account

14:24 when making that um determination and

14:27 then having worked out which LSP to move

14:31 in this example it decides to move the

14:33 blue LSP it sends a pset message to R1

14:37 because r1's Ingress router of that blue

14:40 LSP and that pset message contains the

14:43 new path of the LSP and so in this

14:45 example it's R1 R5 R6 R7 because of

14:50 course it's Pathfinder that's

14:52 determining the new path of the LSP

14:54 because it's best placed to know what

14:57 path to move it onto because it can see

14:59 the traffic levels around the network

15:02 and so then R1 responds by moving the

15:06 LSP

15:06 accordingly and so you can see that

15:09 without any human intervention the

15:11 system succeeded in avoiding excessive

15:14 congestion occurring on the link

15:18 so let's now go back to our demo setup

15:22 and we will look at the network

15:26 topology again

15:29 and so now you can see a change that

15:31 link that we looked at before that had

15:33 quite a lot of traffic has now gone down

15:35 to about 43

15:37 um traffic loading and we can look at

15:40 the paths of some of the LSPs that we

15:42 looked at

15:43 um before you can see that this one

15:45 which previously had been using that

15:47 quite loaded link it's now moved on to a

15:51 different path now it follows the path

15:52 amp stem Frankfurt Prague and so what

15:55 happened is that the controller

15:58 Pathfinder moved that LSP in order to

16:01 bring that link below the congestion

16:04 threshold completely without any human

16:07 intervention

16:09 when this system is is kicking in is it

16:12 able to take in account

16:14 um whether or not there's a desire to

16:16 keep things symmetrical uh with LSP

16:19 pairs

16:21 um yes it does take that into account so

16:23 um the symmetric LSPs would

16:26 um stay

16:27 um intact that's um true yes that's

16:30 right

16:31 okay so it does the evaluation in both

16:33 directions to make sure that the the

16:35 move is not going to cause a problem

16:40 uh and the second question is is this

16:42 system also available for trending data

16:46 so that you can see when you need to do

16:48 upgrades for or you know bandwidth

16:51 upgrades for particular links and you

16:54 know when you would anticipate uh

16:56 Crossing thresholds

16:58 um yes you can look at um traffic as a

17:01 function of time on a given linking

17:04 facts we could look at that

17:07 um now or indeed on a given LSP so for

17:10 this LSP we can look at how much traffic

17:12 has been traveling as a function of time

17:14 along that LSP I mean this one is fairly

17:17 I'm flat and as you can see as I hover

17:19 this person you can see what the traffic

17:22 was at that point in time but then you

17:24 can do similar things for actual um

17:28 physical links as well within the

17:31 um networks that's something else that

17:34 um you can do

17:37 um in fact so we could look at a link

17:39 here by way of example

17:42 so here you can see traffic on this link

17:44 as a function of time in each direction

17:47 as well you can see that it dropped um

17:50 in the last few minutes after the

17:52 congestion avoidance um kicked in

17:54 um in one of the directions that's quite

17:58 um handy now when it comes to

18:00 um capacity planning as I mentioned one

18:03 can take snapshots of the network and

18:05 import them into Paragon planner which

18:08 is the planning tool and that snapshot

18:10 includes um traffic levels and so um

18:14 then in planner if you want to

18:17 um look at anticipated traffic over the

18:20 next few months you could apply a

18:22 multiplier for example you could

18:23 multiply all of the traffic

18:25 um by a multiplication factor of say 1.5

18:28 to see you know what effect that has and

18:31 with that increased traffic one can

18:33 perform exhaustive failure simulation to

18:35 see that even in the face of the

18:37 increased um traffic

18:40 um Can the network for example survive

18:42 single link failure or double link

18:45 failures and node failures srog failures

18:49 and so on so the two work quite well

18:52 hand in hand when it comes to

18:55 um you know on the one hand live traffic

18:56 management within the live Network and

18:58 also and capacity planning looking into

19:00 the future

19:02 if I have SLA traffic that I'm carrying

19:04 on those LSPs for you know various mpls

19:06 VPN services for Enterprise customers or

19:08 other service providers how do you

19:11 orchestrate that how do you kind of plan

19:12 for that and how do you

19:14 um more specifically figure out when

19:16 you've when you've over committed your

19:18 your slas and say hey I have more

19:20 traffic that I can move I need to meet

19:22 this latency Target I need to meet this

19:24 whatever this target is

19:26 um and I can't move these because they

19:28 I'll violate SLA how does that rise yes

19:31 well when it comes to latency that yeah

19:33 it's a very good question actually

19:34 because

19:35 um if

19:37 um you know before the congestion

19:38 occurred

19:39 um you know presumably the low latency

19:42 traffic is following the lowest latency

19:43 path because as you saw in the previous

19:46 um demos

19:47 um you know Pathfinder can keep

19:50 um you know low latency LSPs on the ls

19:53 on the path that is currently the lowest

19:55 latency path and so you don't want the

19:57 congestion avoidance to divert such low

20:00 LSPs sorry such low latency LSPs away

20:04 and because presumably the New Path will

20:06 be somewhat longer and

20:08 um therefore have higher latency and we

20:11 can ensure that um in fact the way the

20:13 congestion avoidance works is that

20:16 um first of all it considers as

20:18 candidates for moving the LSPs that have

20:21 the worst um priority level because an

20:24 LSP can have one of eight priority

20:26 levels and um if there's no suitable

20:29 candidate within that worst priority

20:31 level moves to the next one and so on

20:33 and so if you wish low latency LSPs not

20:36 to be moved by the congestion avoidance

20:38 algorithm then you can give them the

20:41 best and priority level so they'll be

20:43 the last to be considered

20:45 um as candidates for moving in so they

20:47 tend to stay on to the you know current

20:50 presumably lowest latency path so the

20:54 priority level is that is that an

20:55 arbitrary value that is assigned by the

20:57 controller is there a you know mapping

20:58 to experimental bits with that how does

21:00 that how does that work exactly

21:01 um that's the value that you can set

21:03 um on the NSP

21:05 um either at creation time or

21:07 um subsequently so it's

21:09 um literally a sort of value ranging

21:11 from zero and to seven that expresses

21:13 the priority

21:16 um level

21:17 um are those levels mapped directly into

21:20 queues

21:23 not um necessarily

21:26 um you know you don't have to actually

21:29 um have a you know mapping between

21:31 priority levels and cues it's more

21:34 um you know related to

21:37 um you know preemption and hold

21:39 um priorities

21:41 um you know if at the end of the day

21:43 there's insufficient

21:45 um bandwidth to carry LSPs across the

21:48 network than the in priority level

21:50 determines which ones you know get

21:53 um access to the network and capacity

21:56 and also you know has a bearing on the

21:59 congestion avoidance algorithm but um

22:01 it's not necessary to have a mapping

22:04 from that into I'm cues necessarily

22:07 okay yeah I'd like to think of that more

22:09 as a controller level priority rather

22:12 than a class of service type of thing

22:13 exactly it's more to do with the

22:16 behavior of the LSP as a whole that's

22:18 right okay so that that is specific to

22:20 the controller and not necessarily

22:23 there's no prerequisite to have

22:26 um you know specific class service cues

22:29 that map directly into those

22:31 really correct distribution of what

22:33 you're saying correct thank you

Watch

25:16

Juniper Networks Explains the Key Success Factors of Autonomous Transport Networking

Network AutomationWAN