Anton Elita, Technical Solution Consultant, Network Automation, Juniper Networks

Juniper Networks Demonstrates Path Diversity and Low Latency Routing with Paragon Automation

Demo Drop Network AutomationWAN

Catch a deep-dive session on the Network Optimization piece of the Paragon Automation suite. This includes is a live demonstration.

You’ll learn

Path diversity
Low latency routing

Who is this for?

Network Professionals Security Professionals

Host

Anton Elita

Technical Solution Consultant, Network Automation, Juniper Networks

Cyril Doussau

Product Strategy & Marketing, Juniper Networks

Guest speakers

Peter Welcher

Architect and Tech Advisor, NetCraftsmen

Nick Buraglio

Network Architect, ESnet

David Penaloza Seijas

Principal Engineer, Verizon Business

Steve Puluka

Network & Security Engineer

Resources

Transcript

0:09 I'm antonelita a technical solution

0:13 consultant at Junior par networks we

0:15 have been talking about Paragon

0:18 automation suite and its applications in

0:21 the network

0:22 from planning orchestration Assurance up

0:26 to optimization

0:27 and today we'd like to go to a deeper

0:31 dive into the last part which is

0:33 optimization in live Network

0:36 right with the first first case which is

0:41 path diversity some customers have

0:45 business requirement

0:47 to um

0:49 to provide a really diverse label switch

0:52 tasks to avoid a single point of failure

0:55 in a network

0:57 um

0:58 not relying on faster route techniques

1:01 and those customers who typically as

1:03 well request a bidirectional corouted

1:05 LSP

1:07 so that the forward and reverse paths

1:10 are sticking to the same set of links

1:12 and nodes

1:14 in such circumstances it is basically

1:17 required to have a central controller

1:19 with a Global Network View

1:20 just because if you look at this diagram

1:24 an English PE like pe1 has no idea of

1:29 LSPs which are instantiated from another

1:32 English B like pe2

1:34 only a controller with global view would

1:38 be able

1:39 to provide true diverse LSPs which have

1:43 been started from different Ingress PES

1:48 I will switch to the network view now

1:52 so um this is our base example Network

1:57 with a few nodes and I will show now the

2:01 link labels

2:03 according to the Isis metrics

2:06 we have recreated two

2:10 different tunnels one is going from

2:13 Amsterdam to Berlin

2:16 and another one is going from Brussels

2:18 to Prague

2:20 so they start and end on different nodes

2:23 in the network and due to the metrics

2:26 that we have in this example Network

2:28 they are crossing the same middle point

2:32 in Hamburg

2:33 so this is a single point of failure in

2:36 case something happens there both LSPs

2:40 will need to be rerouted or will go down

2:43 for a certain period of time

2:46 so how to avoid this happening

2:50 um we could of course do the

2:52 provisioning from uh from the controller

2:54 so we have here specific tabs here to

2:58 provision diverse tunnels

3:00 but I would like as well to um touch

3:02 upon

3:03 um

3:04 automatic uh or apis so

3:09 we have in Pathfinder Northbound

3:11 interface using rest

3:14 and if we were to program those LSPs

3:19 automatically we would use this

3:21 interface to push the request to

3:24 Pathfinder and I will now show

3:27 um such a rest client

3:32 um I will first need to authenticate

3:34 myself with Pathfinder

3:36 um I will get a token as a reply from

3:39 Pathfinder and I could use this token in

3:42 my programmatic apis and Pathfinder

3:49 a rest call

3:52 and this uh

3:54 content is formatted in Json

3:57 and the contents says which LSP is to

4:00 create their name configuration method

4:03 is psap

4:05 and we have endnotes

4:08 and we have properties for each LSP

4:10 important are of course

4:13 diversity level uh diversity group and

4:16 that we want to create a core routed

4:18 pair ID

4:20 and the same goes for the other pair of

4:23 LSPs that we are going to signal

4:26 so now I'm really triggering

4:28 the

4:31 Ascent button

4:32 and I already received the reply from

4:34 Pathfinder in the bottom part of the

4:36 screen

4:38 mirroring basically my request and with

4:41 some few attributes added like admin

4:43 status and some others

4:46 so now it is safe to switch back to the

4:49 network View

4:51 which suggests that we have new network

4:55 events

4:56 I will refresh the LSP table to show the

5:01 Recently Added LSPs

5:04 and these are selected now so Brussel to

5:07 Prague and it's a reverse direction from

5:11 Prague to Brussels are taking exactly

5:14 same

5:16 links and nodes throughout the network

5:19 and same happens to the second pair of

5:22 bidirectional connected LSPs between

5:24 Amsterdam and Berlin

5:26 but if I select all four of them

5:28 this will show me that they are indeed

5:32 truly diverse

5:33 to each other and not crossing any

5:36 single point of failure in this network

5:39 so with this we show that using a

5:43 controller with a Global Network view

5:45 allows establishing maintaining path

5:48 diversity even if you have a requirement

5:51 to have a bidirectional routed LSPs I

5:56 just want to confirm that this system is

6:00 also able to take into account things

6:03 like shared risk link groups and

6:06 um and coloring that can be done that

6:09 can label basically underlying physical

6:12 shared infrastructure as opposed to The

6:14 Logical one

6:17 yes really great comments so we we

6:20 indeed have a possibility to take into

6:23 consideration from the large site where

6:27 we have multiple nodes and all the way

6:30 down through a single link and a shared

6:33 with link groups including nodes and

6:37 links

6:38 let's say that

6:40 um your network is

6:42 um

6:43 significantly larger than this and you

6:47 need to have an explicit ero that

6:50 transits the entire network that exceeds

6:53 the maximum uh

6:55 label depth of the hardware

6:59 does this platform support things like a

7:03 binding Sid to create a

7:06 you know a longer ero than let's say

7:09 whatever 12 or whatever the maximum

7:12 segment depth or maximum label depth is

7:15 yes so so this is indeed um a question

7:19 um for uh many service providers where

7:21 the number of hops might exceed

7:23 um the hardware uh possibilities of

7:26 Ingress nodes

7:27 um for this we have foreseen a few

7:30 Solutions one of them would be um using

7:34 label compression so

7:36 um Pathfinder would is able to create

7:38 LSPs consisting of as much as a single

7:42 uh label if if you don't want to stick

7:44 with uh

7:45 um with specific nodes but with your

7:48 requirement if your requirement is to go

7:50 through a certain

7:53 um certain segments in the network then

7:55 indeed we can leverage binding seats

7:58 this is supported in Pathfinder to

8:00 create a smaller label Stacks

8:03 um so that the transit node is

8:05 uncompressing this binding seed and

8:08 sending it over the next list of

8:11 segments

8:12 all right so essentially you know

8:14 stitching two LSPs together but they

8:16 look like one

8:18 is really kind of the

8:20 high level explanation of what I've

8:22 asked for the other question I have is

8:25 for LSP failover

8:29 um do you have a support for running

8:33 spfd inside the LSPs and does this

8:37 support that signaling

8:40 oh yeah yes so uh we support a seamless

8:44 bft for every provisioned LSP

8:48 um and these support has been as well

8:51 proven at the latest entc meeting we had

8:54 in um in February this year with um

8:57 other vendors as well

8:59 and this is kind of relating around

9:01 recent events you know one of the things

9:02 in the last couple years as we saw the

9:04 the rise of bandwidth and the saturation

9:06 of bandwidth with the pandemic one of

9:08 the things I noticed since you have

9:09 Europe up here specifically is that the

9:13 the the background infrastructures of

9:14 Europe and the United States are very

9:15 different United States depends on a lot

9:17 of caching points that are very close to

9:19 the provider edges in Europe there's a

9:21 lot of pnis and it's you know largely

9:23 dependent more on bandwidth than pmi's

9:25 than caching when you get them to a more

9:27 complex topology and scenario how does

9:29 that how does that scale as far as being

9:30 able to Monitor and react I know that

9:32 you know in the beginning of the

9:33 pandemic we saw a lot of pnis saturated

9:36 as we were going across uh Eastern

9:38 Europe Central Europe into Western

9:39 Europe and a pack of loss and things

9:41 like that if I'm managing this if I'm a

9:43 you know tier one Transit provider you

9:45 know how would I use this to scale to be

9:47 able to manage reacting to those kinds

9:49 of challenges on a large scale so we

9:52 have automated congestion avoidance in

9:54 the network which will be presented uh

9:56 in a few minutes by my colleague Julian

9:59 okay so um I'm switching to the um next

10:04 um use case we are going to show today

10:07 and which is um low latency routing

10:11 here the business requirement is uh to

10:14 provide the lowest latency or maybe even

10:16 guaranteed lowest latency for critical

10:18 Services

10:19 um with um even including some service

10:22 level agreements

10:24 modern networks have a really a mixed

10:28 speed links

10:30 which are participating in such networks

10:33 variety probably from 10 to 400 gig off

10:37 with all possible speed variations

10:39 so multiple service providers would

10:42 um base their metrics not on the delay

10:45 but they would use for example bandwidth

10:49 as a reference for the metrics then it

10:52 is not optimal for this business this

10:55 business requirement because the higher

10:57 bandwidth path is not always the lowest

11:00 delay

11:01 so how to solve this premium requirement

11:06 without rebuilding the whole network

11:08 metric system

11:10 so we have here um a solution comprising

11:15 multiple components so first we need to

11:18 measure the latency on each Network

11:20 segment this is this is obvious

11:22 um and then we need to distribute this

11:24 information to the controller and let

11:27 the controller

11:28 um find the lowest delay path uh having

11:32 the sum of all of the delays on every

11:35 participating Network segment

11:38 but on top of this we want to make sure

11:40 that we understand that our customer is

11:45 IP using the network and how to measure

11:49 this user experience

11:51 um this is the big question and we have

11:53 an answer to this with uh

11:56 simulating customer traffic and over

11:59 service providers Network so that we

12:02 really see the experience that a normal

12:05 user would have

12:09 um I'm switching back to the network

12:11 View and I will change the link labels

12:15 to show the measure DeLay So we have

12:19 here um on multiple links Dynamic delay

12:23 measurement I like on the selected link

12:25 I'm seldom to Frankfurt some other links

12:28 might have static value for the delay

12:31 measurement

12:33 um I would like to um to focus today

12:38 um on this Crosslink just because we

12:40 have an impairment tool which would make

12:43 this latency look much worse but before

12:47 I impair it

12:49 um I would like to review a few tunnels

12:52 a few LSPs which are crossing this link

12:56 for example one of them

12:58 um starting with the name LL for low

13:00 latency I have selected it now and it

13:03 goes from Amsterdam to Brussel but it is

13:07 crossing a node in Frankfurt just

13:09 because the direct link I'm turning to

13:12 Brussels has a higher latency of 15

13:14 milliseconds compared to Total latency

13:18 of just a little Beyond three

13:21 milliseconds

13:22 going over Frankfurt

13:25 so um before I explain how this data

13:29 gets into a Pathfinder I will switch to

13:32 an impairment tool and we'll start an

13:37 impairment on this link

13:43 so

13:47 while the impairment tool is uh doing

13:49 its job

13:51 um Let Me Explain how we get this data

13:53 so first um the two adjacent nodes like

13:57 comes to them in Frankfurt

13:59 um they sent

14:00 um so-called two-way active measurement

14:02 protocol T1 flight probes across the

14:05 link to each other

14:07 to be able to very precisely measure the

14:11 latency on the link so it is measured in

14:13 microseconds

14:14 and then this information is uh being

14:18 propagated to the igp like Isis or ospf

14:22 and from there for each Network domain

14:26 we export this data along with other

14:29 traffic information

14:31 um

14:32 we export it to a central controller

14:35 like python Pathfinder and then we are

14:38 able to figure out what is the lowest

14:41 layaway from Pathfinder and all what

14:44 remains is to use a piece of protocol to

14:47 Signal an LSP or change a path of an

14:50 existing LSP

14:53 um if you looked um at the screen while

14:56 I was talking you probably already

14:57 noticed that this Crosslink already has

15:00 some average delay increased from um

15:03 from the value of sub 1 milliseconds to

15:06 um 35 uh or something

15:09 milliseconds in average for the um last

15:12 period of time of measurement

15:15 so um

15:17 how do we know that this increase in

15:20 measurement does

15:24 um just introduce any problem to our

15:27 customers for this we use Paragon active

15:30 Assurance to inject

15:33 uh synthetic probes which are mimicking

15:36 customer traffic

15:39 um for here here I have a set of

15:43 low latency probes which are using the

15:46 customer vpns all around the network

15:49 and you probably already see that um the

15:54 green bar which is showing the quality

15:56 of our service for the last 15 minutes

16:00 this is the selected interval has turned

16:03 from Green

16:05 um to First red and then to black color

16:08 let me explain what um these colors mean

16:12 so um this is a drill down view of the

16:16 same active probe we had previously a

16:21 value of delay which is uh according to

16:25 the SLA um with this customer and then

16:28 after introducing impairment the delay

16:31 jumped up to a 50 millisecond which is a

16:35 breaching

16:36 um the contract and then this value is

16:40 considered to be equal to an outage

16:43 because it is way higher than we

16:46 promised

16:48 but you probably already noticed that um

16:51 after some time the delay went uh down

16:54 back to a couple of milliseconds so let

16:58 us see what happened and why the delay

17:01 turned back to normal

17:06 so I'm switching back to our Network

17:08 View

17:10 so we already saw that Pathfinder has

17:12 received the updated delay information

17:15 from the network and it reflected it

17:18 even in the user interface and but what

17:22 happened in the background

17:24 uh Pathfinder has for this demo

17:27 aggressive LSP optimization timer

17:30 this timer reviews the delays in the

17:33 network and looks for a delay sensitive

17:37 LSPs and can automatically without human

17:40 intervention

17:42 reroute them to A New Path

17:45 and this is exactly what happened to our

17:47 example LSP that we saw a bit earlier

17:49 instead of going Amsterdam Frankfurt to

17:52 Brussels it now takes the direct path

17:54 from Amsterdam to Brussels just because

17:57 the latency on that link is 15

17:59 millisecond which is way lower compared

18:02 to the sum of latencies

18:04 on the path via Frankfurt and to be sure

18:08 that we are looking at the same LSB we

18:11 could check the events what happened to

18:14 this um LSP in the last period of time I

18:17 will select some value in the past so

18:20 this is exactly what we saw when we

18:22 started our demo and I can compare

18:25 visually compare

18:27 um the LSP path as of now so I have

18:30 selected the latest LSP update and we

18:33 clearly see that the change was exactly

18:37 as we noticed um earlier

18:41 so with this we have reviewed a very

18:44 demanded use case for a low delay

18:47 service placement

18:49 continuous measurement of customer

18:51 experience as well as automated LSP

18:55 optimization in a changing Network

18:57 environment and gives an operator very

19:01 powerful tool to provide best-in-class

19:03 service for their customers so how do

19:07 you

19:07 how do you account for uh failures in

19:09 the mpls data plane so you know

19:11 something you know it's really common on

19:13 you know any kind of equipment is you'll

19:14 have a you know you'll have a route that

19:16 gets pushed into

19:18 um the mpls forwarding plane forwarding

19:20 database you know and you've got an LSP

19:22 but the Asic and the table are out of

19:25 sync and you don't actually forward so

19:27 you're looking at this obviously you

19:29 have an LSP you think your LSP is good

19:31 you think that you're going to move

19:31 traffic to that LSP but it doesn't

19:34 actually work because of you know a bug

19:36 and the A6 out of sync with the

19:37 forwarding table how would you handle a

19:38 condition like that with the controller

19:40 is that something that is part of the

19:42 monitoring

19:43 yes yes so this is uh this is very uh

19:46 tough use case

19:48 um to to to to find the culprit for it

19:50 um and for this we could address

19:53 um with with two approaches or basically

19:55 a combination of um two things together

19:57 one would be um as shown uh previously

20:01 the active Assurance probes which uh in

20:03 timely manner can um see that the

20:06 traffic is being black holed and uh it

20:09 can trigger

20:10 um additional automated checks on

20:13 Paragon insights so Paragon insights uh

20:16 might start you know preparing for the

20:18 network operator which will troubleshoot

20:21 it uh afterwards a set of tests like for

20:25 example Trace routes

20:27 collecting interface information right

20:29 at the moment where it was triggered

20:31 this is what regards to active

20:34 monitoring but we have as well with

20:36 insights our passive monitoring which

20:39 should be run then continuously and

20:41 which would react on uh for example

20:44 increased counters of

20:47 traffic drops on the forwarding plane

20:51 just because the equipment usually is

20:53 able to you know to account for the

20:55 packets dropped for no reason like for

20:57 example no route to the destination and

21:00 then if we have this mismatching

21:03 programming of the data plane and in the

21:06 state of the routing tables and then we

21:09 would probably push some data

21:11 increasingly towards black hole

21:14 destination and we will see a high

21:16 increase of such counters in our

21:19 monitoring tools so there are really

21:21 many counters we could monitor with this

21:24 um and this is how we can tackle this

21:26 city iteration

21:28 if for any reason

21:31 then

21:32 there will be a change triggered by the

21:34 controller and given the telemetry

21:37 well insights and all the the mapping of

21:41 the information the controller is

21:43 realizes or not or notices that there is

21:46 a

21:47 detrimental

21:48 effect of that change is the controller

21:51 going to roll it back

21:54 do you need a user confirmation or

21:56 administrator confirmation for that for

21:58 instance

21:59 uh so so this is a truly configurable of

22:03 course so we we understand that uh

22:06 closed loop automation uh is is the way

22:09 uh to the Future uh but it we cannot uh

22:13 take it um right from day one and and

22:16 set to boil the ocean with um everything

22:19 um fully automated so the trust will be

22:21 gained uh you know uh step by step so

22:24 today most operators would probably

22:27 um trust a Pathfinder to do the

22:29 rerouting as shown in this demo uh for a

22:33 fully

22:33 um you know automated set of actions

22:36 like changing configurations uh maybe

22:38 rolling back and doing you know

22:40 artificial intelligence we need time to

22:43 um you know for to gain distrust uh but

22:47 we are on a good way here so for this we

22:49 have already some artificial

22:52 intelligence bits included in our

22:54 Paragon insights uh which would um help

22:57 us to to get there so we hope that uh

23:00 our showcase did provided some light on

23:02 what we can achieve today with a proven

23:04 Cloud native automation stack so if you

23:07 want to continue investigating uh the

23:10 those technology you have a two options

23:13 to suggestion one without have an array

23:16 that actually goes through the benefits

23:18 of implementing such technology and that

23:20 qualify those benefits or you can simply

23:23 ask for Pilots that's all that was shown

23:27 today is proven technology and that

23:30 technology that has been deployed by

23:32 serious fighter around the world so we

23:35 like as well to us thank our delegates

23:37 with all their most relevant questions

23:40 and we hope to hear from you and soon

23:44 thank you

Watch

22:41

Juniper Networks Demonstrates Congestion Avoidance with Paragon Automation