Juniper Networks Demonstrates Congestion Avoidance with Paragon Automation
![Julian Lucek Headshot](/content/dam/www/assets/mediaportal/speakers-hosts/2023/julian-lucek.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg)
![Still image divided vertically in half with a headshot of Julian Lucek on the left, he title at the bottom of that half of the page ‘Senior Distinguished Systems Engineer, Juniper Networks. The right half has the TechField Day logo at the top, then ,SHOWCASE Presented by: Juniper Networks, Congestion Avoidance with Paragon Automation’ taking up the remainder of that half.](https://i.ytimg.com/vi/tBFfkxKjTSM/hqdefault.jpg)
Manual and semi-manual tasks take out a lot of time and bandwidth from a network engineer’s workday.
In this Tech Field Day Showcase, presenter Julian Lucek, discusses Juniper Networks’ Paragon Automation platform. The platform provides closed-loop automation leveraging AI, making sure that services are delivered right and on-time.
You’ll learn
The different software products included in the suite
How the pieces interact with each other synchronously within an API driven framework
Who is this for?
Host
![Julian Lucek Headshot](/content/dam/www/assets/mediaportal/speakers-hosts/2023/julian-lucek.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg)
Guest speakers
![Peter Welcher Headshot](/content/dam/www/assets/mediaportal/speakers-hosts/2023/peter-welcher.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg)
![David Penaloza Seijas Headshot](/content/dam/www/assets/mediaportal/speakers-hosts/2023/david-seijas.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg)
![Steve Puluka Headshot](/content/dam/www/assets/mediaportal/speakers-hosts/2023/steve-puluka.jpg/jcr:content/renditions/cq5dam.web.1280.1280.jpeg)
Transcript
0:09 hi everyone I'm Julian luchak I'm a
0:12 distinguished SC at Juniper Networks and
0:15 I'll talk in more detail about the
0:17 components of the Paragon automation
0:19 Suite so here are the different
0:22 components of the Paragon automation
0:24 Suite so looking from right to left
0:26 across this diagram first of all we have
0:29 Paragon Pathfinder which will play a
0:31 starring role in the demos that we're
0:34 going to show today and Paragon
0:36 Pathfinder has the ability to create
0:38 traffic engineered LSPs those can be SRT
0:41 or rsvpte LSPs and it can modify the
0:46 paths of those LSPs during their
0:48 lifetime according to observations made
0:51 from the live Network
0:54 and then we have Paragon insights that's
0:56 our health monitoring system it's
0:59 capable of identifying faults in network
1:01 elements and taking
1:03 um corresponding actions Paragon active
1:06 Assurance is the solution for active
1:10 probing across the network it can send
1:13 probes between pops or from Pops to
1:18 customer sites or from pop to cloud in
1:21 order to ascertain the performance
1:23 between different end points and that
1:26 can be in the form of parameters such as
1:30 delay delay variation and packet loss
1:34 ratios and then finally we have Paragon
1:36 planner which is an offline and planning
1:39 tool and that's capable of taking
1:42 snapshots of the live network from
1:45 Paragon Pathfinder and then with that
1:49 snapshots you can have a network model
1:52 on which you can do exhaustive earlier
1:54 simulations or capacity planning all of
1:57 these components are Cloud native
1:59 kubernetes based you can deploy them on
2:03 Prem you could deploy them in the public
2:05 cloud and also we recently announced
2:08 that we're going to have a SAS based
2:09 solution as well
2:12 what I'd like to do next is to show how
2:15 the different components can interact
2:17 with each other so at the bottom of this
2:19 slide we have the network itself and
2:22 Pathfinder can see the topology of the
2:26 network through the bgpls protocol so
2:28 that allows it to see the layouts of the
2:31 links and nodes and attributes of links
2:33 such as bandwidth srlgs and different
2:36 types of metric also Pathfinder can
2:40 create traffic engineered LSPs via the
2:43 pset protocol and modify them according
2:46 to observed conditions or user input now
2:50 Paragon insights is receiving streaming
2:52 Telemetry from the network in order that
2:54 it can ascertain the health of network
2:57 elements and for automatic remediation
3:02 it can send requests to Paragon
3:04 Pathfinder to create a maintenance on a
3:08 faulty Network element
3:10 Pathfinder can expose meshes of traffic
3:14 engineered nsps and some you can choose
3:16 to map a VPN to a particular flavor of
3:18 LSPs so for example if you have a VPN
3:21 that needs low latency service you can
3:23 map it onto the minimum latency mesh of
3:26 LSP Paragon active Assurance is sending
3:30 probes through the network in order to
3:32 ascertain the performance between
3:34 different endpoints as we heard and if
3:38 the performance
3:39 levels are violated then active
3:42 Assurance can send an alert to Paragon
3:44 insights so that it can take actions
3:46 accordingly
3:48 finally Paragon Pathfinder can trick can
3:52 create
3:53 snapshots of the live Network and those
3:57 can be passed on to
3:59 um Paragon planner and so that you can
4:01 perform capacity planning and simulation
4:04 in a network model that has been derived
4:08 from the live Network
4:11 for the te LSP creation mechanism
4:16 you've got psep down there is there
4:19 still a requirement for the netconf
4:22 configuration pieces for pushing LSPs at
4:27 one point when I looked at the previous
4:29 version of this for Juniper equipment it
4:33 didn't actually provision via psap it
4:35 still used netconf to do that
4:38 it's always
4:40 um it's always supported um both
4:42 actually so from the outset both have
4:43 been
4:44 um supported so netconf can be useful if
4:47 you've got Legacy devices that don't
4:49 support
4:50 um pseps so it's another it's an
4:52 alternative method that can be used but
4:54 in the main um people tend to use
4:57 um psep you know I like that a lot that
5:00 it supported both because in the
5:02 transition mode when we were deploying
5:04 this in a in a network being able to use
5:07 the net conf to really add psep as an
5:11 alternative control on a pre-configured
5:13 running LSP
5:16 um was a great option to transitioning
5:19 the the network over to a complete path
5:22 control on a on an existing uh large
5:27 Network because we all know there's no
5:29 such thing as a green fielded service
5:30 providers
5:32 yeah that's very true yes certainly the
5:34 other thing you can do is um if you have
5:36 pre-existing
5:38 um LSPs that have been created on the um
5:42 Ingress routers you know via CLI
5:45 um config
5:46 um you can actually delegate those to
5:49 um Pathfinder via psip and so
5:52 um you know if you turn on psep then
5:54 there's an extra line of config on an
5:56 LSP to delegate it and what delegation
5:59 means is that pset message gets sent by
6:02 the Ingress router to the controller
6:05 Pathfinder you know saying that
6:08 um this LSP has been
6:11 um delegated and from that point on
6:13 um Pathfinder can alter the path of the
6:16 LSP as needed in it during its lifetime
6:19 so that's an alternative
6:21 method of having psep you know running
6:25 with pre-existing nsps we found that was
6:28 a great feature to do that because you
6:31 when we first this up you have a you
6:34 know you have thousands of of
6:36 uh of services and LSPs out there and
6:40 just to be able to add that one
6:42 delegation line to existing
6:44 configuration with no service
6:46 Interruption whatsoever and then all of
6:48 a sudden you have the you know the
6:50 control the central control and the
6:52 rerouting possible
6:54 oh another thing is actually you can do
6:56 the delegation from the Pathfinder side
6:59 so if you wish you can have
7:01 um Pathfinder you know at that extra
7:03 line of config you know for you in order
7:05 to trigger that delegation yeah that's
7:08 exactly what we did we just had the push
7:11 the line push the line
7:13 the the talk of the LSPs there is is
7:16 there was a whole lot of uh word soup
7:19 that or acronym soup that happened there
7:23 um and for anyone that isn't totally
7:25 familiar with LSP provisioning in NSR
7:30 networks the the terminology and the
7:33 mechanism that we just described is
7:35 essentially
7:36 PCC initiated pce controlled right
7:40 and I think it really shouldn't be
7:43 understated like Steve said that the
7:45 fact you can if you have an existing
7:47 mechanism for provisioning LSPs command
7:51 line or whatever it is right you already
7:53 have an existing system you can still
7:56 use that while you Transit transition
7:58 over to using a
8:02 um you know segment routing controller
8:04 pce
8:05 and then just slowly transition each of
8:08 the LSPs over it's not a boil the ocean
8:11 uh type scenario for a large Network
8:14 that has you know maybe hundreds or
8:16 thousands of LSPs already that they
8:19 don't need to do a
8:20 you know a Flag Day essentially you can
8:23 just delegate those existing LSPs and
8:25 continue to use your old mechanism until
8:27 you move over to the new one absolutely
8:30 yes I mean one way of doing that is you
8:33 could do it
8:34 um Ingress routes to buy Ingress routes
8:36 you could say on day one
8:38 um the LSPs of which this router is the
8:40 Ingress those are the ones I'm going to
8:42 delegate now and then subsequently you
8:45 know another Edge routes which is the
8:48 Ingress of some LSPs you know those can
8:50 be delegated and so on so that's one
8:52 method of doing it in a stage by stage
8:55 way
8:57 yeah that's that shouldn't be that
8:59 shouldn't be discounted that's a that's
9:01 a pretty
9:02 powerful feature set yeah well and you
9:06 even have the flexibility of using
9:08 Paragon as the
9:11 controller without actually provisioning
9:13 anything from it you know you can just
9:16 use your existing provisioning system in
9:19 in all its uh glory and well-knownness
9:23 uh to continue to push things out there
9:26 and and still have the LSPs controlled
9:29 by Paragon even though it created none
9:31 of them
9:32 yeah that's that's pretty much the
9:35 migration strategy that I've seen
9:37 every time
9:39 so next we're going to see a demo of
9:41 automated congestion avoidance so we're
9:44 looking at the same network as before
9:46 but this time I'm now showing the
9:48 percentage utilization on each link this
9:52 information is derived from the
9:54 streaming Telemetry that the routers are
9:56 sending to the Paragon automation system
10:00 and you'll note that the links have got
10:03 varying amounts of traffic on them and
10:06 in particular this one which I'd like to
10:08 highlight the link between Amsterdam and
10:10 Hamburg in the direction from Amsterdam
10:12 to Hamburg has
10:14 um about 76 percent
10:17 um traffic loading at um the moment so
10:21 far I haven't turned the automated
10:22 congestion avoidance on because what I
10:25 want to do is to turn it on in a few
10:28 minutes so that we can see the
10:29 difference before and after let's have a
10:32 look at the paths of some LSPs that are
10:35 passing through that busy link so the
10:38 LSP from Amsterdam to Prague is passing
10:41 through that link as you can see
10:43 as is the one from Amsterdam to Berlin
10:47 and so now I'm going to do is to
10:49 actually turn on the congestion
10:51 avoidance so I'm going to go into one of
10:54 the menus
11:00 and so now I'm going into a settings
11:01 menu normally when we'd have this turned
11:04 on permanently but we want to see the
11:07 difference before and after so I'm going
11:08 to set a threshold here and
11:13 we are going to then submit that
11:17 and so this is a threshold
11:19 um above which we wish
11:22 um the Pathfinder to move um some of the
11:25 LSPs in order to bring the link below
11:27 the and threshold again we here have
11:31 applied it on a global basis but you can
11:33 also apply it in a more granular basis
11:35 with a different threshold on each link
11:37 if you wish so while we're waiting for
11:40 that to kick in we'll see with the aid
11:43 of some slides how the system
11:45 um actually deals with the congestion
11:48 so here's a diagram I'm explaining how
11:52 the scheme works and of course if a link
11:56 gets congested to the extent that it's
11:58 actually dropping packets then
12:01 um clearly the customer's applications
12:03 are
12:04 um going to suffer so that's one of the
12:06 motivations for having this congestion
12:09 avoidance the other motivation is and
12:11 from the capex point of view if traffic
12:14 is spread efficiently around the network
12:16 in order to use the available nodes and
12:19 links then that delays somewhat the
12:23 point in time at which you need to
12:24 upgrade some of the links in the network
12:28 in the face of increasing traffic over
12:31 the course of the weeks and months
12:33 so let's see how it works and so
12:37 um Paragon
12:39 Pathfinder is receiving streaming
12:42 Telemetry relating to traffic and that
12:46 Telemetry is of um two different types
12:50 first of all the routers are reporting
12:52 how much traffic is traveling on each
12:55 physical link
12:58 and then the Ingress routers of traffic
13:00 engineered LSPs are reporting how much
13:04 traffic is entering each of the traffic
13:06 engineered LSPs of which it's the
13:09 Ingress because of course traffic can
13:11 only enter a traffic engineered LSP at
13:13 the Ingress router and those traffic
13:17 engineered LSPs could be RSVP LSPs or
13:20 they could be srte nsps and so in this
13:24 example Network R1 is reporting how much
13:27 traffic is entering the purple LSP and
13:30 the blue LSP and
13:32 um R4 is reporting how much traffic is
13:35 entering the green LSP
13:39 now let's suppose that um the link
13:42 between R2 and R3 has reached the
13:44 congestion threshold that we have um set
13:48 um Pathfinder can see that through the
13:50 streaming telemetry
13:52 also it knows which LSPs are passing
13:54 through that link that's reached the
13:57 congestion thresholds that we set and
14:01 furthermore it knows through the
14:03 streaming Telemetry how much traffic is
14:04 traveling on each of those nsps and so
14:07 Pathfinder has all of the information it
14:10 needs in order to work out which LSPs to
14:13 move away in order to ease the um
14:16 congestion
14:18 and of course in so doing it needs to
14:20 make sure not to cause congestion
14:22 elsewhere so it takes that into account
14:24 when making that um determination and
14:27 then having worked out which LSP to move
14:31 in this example it decides to move the
14:33 blue LSP it sends a pset message to R1
14:37 because r1's Ingress router of that blue
14:40 LSP and that pset message contains the
14:43 new path of the LSP and so in this
14:45 example it's R1 R5 R6 R7 because of
14:50 course it's Pathfinder that's
14:52 determining the new path of the LSP
14:54 because it's best placed to know what
14:57 path to move it onto because it can see
14:59 the traffic levels around the network
15:02 and so then R1 responds by moving the
15:06 LSP
15:06 accordingly and so you can see that
15:09 without any human intervention the
15:11 system succeeded in avoiding excessive
15:14 congestion occurring on the link
15:18 so let's now go back to our demo setup
15:22 and we will look at the network
15:26 topology again
15:29 and so now you can see a change that
15:31 link that we looked at before that had
15:33 quite a lot of traffic has now gone down
15:35 to about 43
15:37 um traffic loading and we can look at
15:40 the paths of some of the LSPs that we
15:42 looked at
15:43 um before you can see that this one
15:45 which previously had been using that
15:47 quite loaded link it's now moved on to a
15:51 different path now it follows the path
15:52 amp stem Frankfurt Prague and so what
15:55 happened is that the controller
15:58 Pathfinder moved that LSP in order to
16:01 bring that link below the congestion
16:04 threshold completely without any human
16:07 intervention
16:09 when this system is is kicking in is it
16:12 able to take in account
16:14 um whether or not there's a desire to
16:16 keep things symmetrical uh with LSP
16:19 pairs
16:21 um yes it does take that into account so
16:23 um the symmetric LSPs would
16:26 um stay
16:27 um intact that's um true yes that's
16:30 right
16:31 okay so it does the evaluation in both
16:33 directions to make sure that the the
16:35 move is not going to cause a problem
16:40 uh and the second question is is this
16:42 system also available for trending data
16:46 so that you can see when you need to do
16:48 upgrades for or you know bandwidth
16:51 upgrades for particular links and you
16:54 know when you would anticipate uh
16:56 Crossing thresholds
16:58 um yes you can look at um traffic as a
17:01 function of time on a given linking
17:04 facts we could look at that
17:07 um now or indeed on a given LSP so for
17:10 this LSP we can look at how much traffic
17:12 has been traveling as a function of time
17:14 along that LSP I mean this one is fairly
17:17 I'm flat and as you can see as I hover
17:19 this person you can see what the traffic
17:22 was at that point in time but then you
17:24 can do similar things for actual um
17:28 physical links as well within the
17:31 um networks that's something else that
17:34 um you can do
17:37 um in fact so we could look at a link
17:39 here by way of example
17:42 so here you can see traffic on this link
17:44 as a function of time in each direction
17:47 as well you can see that it dropped um
17:50 in the last few minutes after the
17:52 congestion avoidance um kicked in
17:54 um in one of the directions that's quite
17:58 um handy now when it comes to
18:00 um capacity planning as I mentioned one
18:03 can take snapshots of the network and
18:05 import them into Paragon planner which
18:08 is the planning tool and that snapshot
18:10 includes um traffic levels and so um
18:14 then in planner if you want to
18:17 um look at anticipated traffic over the
18:20 next few months you could apply a
18:22 multiplier for example you could
18:23 multiply all of the traffic
18:25 um by a multiplication factor of say 1.5
18:28 to see you know what effect that has and
18:31 with that increased traffic one can
18:33 perform exhaustive failure simulation to
18:35 see that even in the face of the
18:37 increased um traffic
18:40 um Can the network for example survive
18:42 single link failure or double link
18:45 failures and node failures srog failures
18:49 and so on so the two work quite well
18:52 hand in hand when it comes to
18:55 um you know on the one hand live traffic
18:56 management within the live Network and
18:58 also and capacity planning looking into
19:00 the future
19:02 if I have SLA traffic that I'm carrying
19:04 on those LSPs for you know various mpls
19:06 VPN services for Enterprise customers or
19:08 other service providers how do you
19:11 orchestrate that how do you kind of plan
19:12 for that and how do you
19:14 um more specifically figure out when
19:16 you've when you've over committed your
19:18 your slas and say hey I have more
19:20 traffic that I can move I need to meet
19:22 this latency Target I need to meet this
19:24 whatever this target is
19:26 um and I can't move these because they
19:28 I'll violate SLA how does that rise yes
19:31 well when it comes to latency that yeah
19:33 it's a very good question actually
19:34 because
19:35 um if
19:37 um you know before the congestion
19:38 occurred
19:39 um you know presumably the low latency
19:42 traffic is following the lowest latency
19:43 path because as you saw in the previous
19:46 um demos
19:47 um you know Pathfinder can keep
19:50 um you know low latency LSPs on the ls
19:53 on the path that is currently the lowest
19:55 latency path and so you don't want the
19:57 congestion avoidance to divert such low
20:00 LSPs sorry such low latency LSPs away
20:04 and because presumably the New Path will
20:06 be somewhat longer and
20:08 um therefore have higher latency and we
20:11 can ensure that um in fact the way the
20:13 congestion avoidance works is that
20:16 um first of all it considers as
20:18 candidates for moving the LSPs that have
20:21 the worst um priority level because an
20:24 LSP can have one of eight priority
20:26 levels and um if there's no suitable
20:29 candidate within that worst priority
20:31 level moves to the next one and so on
20:33 and so if you wish low latency LSPs not
20:36 to be moved by the congestion avoidance
20:38 algorithm then you can give them the
20:41 best and priority level so they'll be
20:43 the last to be considered
20:45 um as candidates for moving in so they
20:47 tend to stay on to the you know current
20:50 presumably lowest latency path so the
20:54 priority level is that is that an
20:55 arbitrary value that is assigned by the
20:57 controller is there a you know mapping
20:58 to experimental bits with that how does
21:00 that how does that work exactly
21:01 um that's the value that you can set
21:03 um on the NSP
21:05 um either at creation time or
21:07 um subsequently so it's
21:09 um literally a sort of value ranging
21:11 from zero and to seven that expresses
21:13 the priority
21:16 um level
21:17 um are those levels mapped directly into
21:20 queues
21:23 not um necessarily
21:26 um you know you don't have to actually
21:29 um have a you know mapping between
21:31 priority levels and cues it's more
21:34 um you know related to
21:37 um you know preemption and hold
21:39 um priorities
21:41 um you know if at the end of the day
21:43 there's insufficient
21:45 um bandwidth to carry LSPs across the
21:48 network than the in priority level
21:50 determines which ones you know get
21:53 um access to the network and capacity
21:56 and also you know has a bearing on the
21:59 congestion avoidance algorithm but um
22:01 it's not necessary to have a mapping
22:04 from that into I'm cues necessarily
22:07 okay yeah I'd like to think of that more
22:09 as a controller level priority rather
22:12 than a class of service type of thing
22:13 exactly it's more to do with the
22:16 behavior of the LSP as a whole that's
22:18 right okay so that that is specific to
22:20 the controller and not necessarily
22:23 there's no prerequisite to have
22:26 um you know specific class service cues
22:29 that map directly into those
22:31 really correct distribution of what
22:33 you're saying correct thank you