Mehdi Abdelouahab, Sr. Consulting Engineer, Juniper Networks 

Apstra Solution

Data CenterNetwork Automation
Mehdi Abdelouahab Headshot
Slide with an image of a cartoon many peering into a large telescope. The image shows him seeing circles interconnected by lines. The slide’s headline reads, “Why use a Graph model in my network operations?” Bullets say, “* Because it is well suited to model highly connected systems modelling; * Because it is extensible to adapt to new network services modelling; * Because it allows queries that were not anticipated at design time.”

Rethink how you think about data center operations 

With Juniper Apstra it’s possible to automate the entire network lifecycle to simplify design, deployment, and operations, and to provide continuous validation. Learn more about how Apstra can deliver assured experiences for applications and operators in this comprehensive guide from Juniper’s Mehdi Abdelouahab.

Show more

You’ll learn

  • How Apstra can help solve Day 0/1/2 challenges through automation with a unified tool for architects and operators

  • Three reasons why you should use a Graph model for network operations

  • How the Intent Time Voyager helps manage the infrastructure as a whole system, thus increasing agility

Who is this for?

Network Professionals Business Leaders

Host

Mehdi Abdelouahab Headshot
Mehdi Abdelouahab
Sr. Consulting Engineer, Juniper Networks  

Resources

Transcript

00:05 [Music]

00:10 good morning everyone glad to be here um

00:14 so i'm a media de la hub i'm a

00:17 consulting sc for uh uh abstra um the uh

00:21 product that has been acquired

00:23 by um juniper a few months ago so uh

00:26 we're gonna

00:27 we're gonna stay in the automation

00:29 domain uh and uh

00:32 uh more towards towards data center

00:35 um so

00:36 uh

00:37 we

00:38 we are a a data center network

00:40 automation solution and i'm gonna

00:42 elaborate on on that uh but um

00:46 at a high level when we look at uh data

00:49 center network operations and uh and the

00:52 usual challenges that uh that we observe

00:54 in this uh in this area

00:56 um so we generally have you know those

00:58 those you know those two personas you

01:00 know the architects and and the

01:02 operators so sometimes with some

01:03 organizations they can be mixed and then

01:05 in other organizations they are they are

01:07 separate and they do have different

01:09 uh you know challenges that they face

01:11 daily so uh i'm not going through all of

01:14 them but typically the architects will

01:16 be um those of you who are in in the

01:18 network architectures you probably care

01:21 more about um

01:23 you know selecting technologies that

01:25 gives you uh freedom to enable and

01:28 activate uh services in a seamless way

01:30 so you look for activation delays and

01:32 then shortening those delays and

01:34 enabling uh service activation uh as as

01:37 quickly as possible from may from a from

01:40 a an agile standpoint uh those of you

01:43 who are in in the architect teams as

01:44 well

01:45 you care about uh vendor flexibility

01:48 right so uh you make uh you make

01:51 design or technology choices today so

01:53 you look for some for the latest and

01:55 brightest out there uh and obviously you

01:58 don't want to lock yourself in a given

01:59 technology or a given vendor today and

02:01 then you want to return the freedom to

02:03 uh

02:04 um to benefit from the from the uh

02:06 whatever enhancement that can come

02:09 later on whether they are harder related

02:11 or you know asic related or not related

02:14 or any even protocol related right um on

02:17 the other side if you are an operator

02:20 you care

02:21 more about where you're most likely um

02:25 up against uh things like um dealing

02:28 with with resource planning and this

02:29 constraint and and the need to have you

02:31 know um knowledge retention in your

02:34 teams and and and and deal with those

02:36 those challenges you also have to deal

02:38 with um

02:40 um the

02:41 typical processes uh related to

02:44 infrastructure changes and uh um going

02:47 from you know tech reviews to approvals

02:50 and so on which typically um

02:53 can be more or less um time-consuming

02:55 and and can in a way slow down the uh

02:58 the agility which is one of the

03:01 objectives of the typical architect

03:03 teams um you also have this major

03:06 concern which is the reliability it's

03:07 probably one of the number one um you

03:09 know challenge that the operators are

03:11 facing so uh the ability or the need to

03:14 deal with changes in a reliable manner

03:16 um because it's also part of your of

03:18 your daily jobs right so um

03:21 you do have different automation

03:22 solutions out there uh in the dc space

03:25 and of course some of them are

03:26 addressing the architects

03:28 more than the operators and so on we

03:29 strive to uh to come up with uh with uh

03:32 with a technical solution for for both

03:34 and hopefully uh we're going to um

03:36 explain that or elaborate on that in in

03:39 the

03:40 in this session so uh

03:42 abstract uh the if you look at the the

03:45 claim or the the you know the technical

03:47 claim that we have

03:48 we are in the business of intern based

03:50 networking so what what this means right

03:52 so

03:52 from a

03:54 high-level perspective this starts with

03:56 the idea that we want to focus on on the

03:59 what uh and let the systems derive

04:01 derive the how from that so uh

04:04 we want to raise the uh the uh the level

04:08 of of

04:09 the declarative nature of the user input

04:11 at a higher level than than before and

04:14 and this matters because the more you

04:16 have your user input which is the

04:18 clarity the more you get closer to this

04:21 um

04:22 to this abstraction uh which which gives

04:24 you really agility and and going into

04:26 this

04:27 into this cloudy operations right so uh

04:30 abstracting stuff away and and raising

04:33 the level of the user input to a to a

04:36 minimum required is an absolute

04:38 requirement to uh um to have um you know

04:42 to meet the agility requirement from

04:43 from the architects teams um so the

04:46 starting point for us is to really

04:47 recognize the system as or the networks

04:49 as a distributed system right so

04:52 we don't want to manage boxes

04:54 individually we manage systems right uh

04:56 so for us

04:58 the uh

04:58 um

05:00 the goal is uh to um to have the

05:02 operator express what the

05:04 what the outcome of the system should be

05:06 and let's let's uh let aos or that

05:09 abstract derived from that um the uh

05:13 necessary steps to get there um so

05:15 really raising the level of attraction

05:17 to to uh something highly declarative

05:19 and letting the system derive from that

05:22 or generate artifacts to uh

05:24 generate all the configurations required

05:27 as well as all the

05:28 expected validations is is the way to go

05:32 and uh what's important as well is that

05:35 uh this this this uh user specification

05:38 this user input is this what is actually

05:40 modeled in the back in the ios in a in

05:43 a in a logical model meaning it's

05:46 completely decoupled from any vendor

05:49 uh or any any model so we store it in a

05:52 in a completely logical way and uh

05:56 that is also a major enabler in in in

06:00 the agility so um from there not only we

06:03 can of course um you know select again

06:06 any given you know

06:08 model here and there

06:09 but uh

06:10 having a a logical definition of the

06:13 user intent not hardcoded to any

06:15 specific device is also what enables us

06:18 to have this close loop you know

06:20 automation assurance that you have in

06:21 the middle so what this means is that

06:24 the uh the product will go beyond

06:25 configuration it will also generate a

06:28 set of expectations that the system has

06:29 to meet and those expectations are are

06:32 derived from from the logical intent

06:34 because we know the high level from the

06:36 beginning so we are then able to derive

06:38 a very detailed expectation uh to then

06:41 um you know automate the data collection

06:43 of of those of those telemetry as well

06:45 as the analysis of the telemetry and

06:47 comparative expectations to dictate

06:49 whether the uh the overall architecture

06:51 is behaving as expected or not so it's

06:54 really

06:55 going beyond this configuration and

06:56 having some form of monitoring but a

06:58 monitoring which is contextual to your

07:00 intent so it's having like a

07:02 a tool that automates this monitoring

07:04 activity but with the with the context

07:06 in mind

07:07 so um

07:08 obviously the uh the the advantage for

07:10 you is from a day to perspective is the

07:12 ability to deal with changes reliably

07:14 because uh you have

07:16 a in in a combined solution both your

07:19 day 0 day 1 as well as your day 2

07:21 solution which is aware of your intent

07:23 from the beginning and able to validate

07:25 it in real time

07:26 and on a continuous basis

07:29 so um

07:30 to to recap that ibm right if you want

07:33 to come up with some form of definition

07:34 even though there is a real formal

07:36 definition of ibm i mean we we have uh

07:38 people within uh within the company who

07:40 are contributing to some some icf draft

07:43 um

07:44 to uh to come up with with some some uh

07:46 some normalization there but from our

07:48 perspective uh it's basically those

07:50 three pillars the first one is having a

07:53 unified

07:54 uh tooling around a single source of

07:56 truth and it's actually this is going to

07:58 be the the major focus of of the rest of

08:00 my presentation is explaining how

08:03 astra acts as a single source of truth

08:05 and what implementation

08:06 backhand we have selected there to uh to

08:09 create a single source of truth that is

08:11 uh able to cope with the dc challenges

08:14 uh the second aspect is um

08:17 is the idea that you you want to

08:18 automate the entire life cycle right you

08:20 want to have a single tool that that is

08:22 being used by by architects to design in

08:25 a logical fashion using um a

08:29 logical building blocks that they can

08:31 assemble uh to create uh um you know

08:34 predictable designs um and and scalable

08:37 designs and having also a design

08:39 disciplines where where the systems you

08:41 know prevents you from designing

08:42 something uh that is not a you know a by

08:44 the book um um

08:47 you know regular list on architecture

08:48 and so on so we inserted a lot of

08:49 expertise uh even at design level to

08:52 prevent the architects from from

08:54 shooting himself in in the fit and

08:56 eventually creating something that is

08:58 not going to scare from from a from a dc

09:00 perspective um

09:02 and and then going from the design to

09:04 the build uh the build process when you

09:06 will instantiate whatever in logical

09:08 templates we have created at design time

09:11 um and uh let the system as well apply

09:15 um you know this

09:16 versus cat element where where we will

09:18 basically automate the distribution of

09:21 variables at scale in a very automated

09:24 way um a typically swine fabric consumes

09:27 a ton of variables you don't want to

09:29 basically be prospective in in assigning

09:32 every every asn or ip look back in a

09:34 in a big five-stage network so you want

09:37 to have no systems to manage those

09:39 infrastructure variables at scale and

09:41 then moving of course to the operator

09:43 phase where you want to have the ability

09:45 to define services in minutes and have

09:48 them deployed in minutes as well as

09:49 operated right after the the

09:51 configuration push

09:53 um so automating the entire lifecycle is

09:55 is important but for that you need to

09:56 have a single social shoe that is able

09:58 to feed all those uh those steps um

10:02 so um and then uh the last pillars of

10:05 course being muted under as i said uh we

10:07 completely decoupled the the uh the uh

10:10 the user intent from from whatever is is

10:13 being um selected underneath so for us

10:16 the hardware selection is is something

10:18 that arrives quite late in the process

10:20 you do it almost at the end of your of

10:22 your build uh right before you hit the

10:24 commit button

10:26 so

10:27 what's the uh you know one of the major

10:29 challenge when you build the network

10:31 automation solution is to have a

10:32 reliable source of truth um

10:35 the social shield is really the way you

10:38 express your desired outcome and you

10:40 need to have something

10:41 um

10:42 that is really adapted to the network

10:44 domain that you are tackling in our case

10:45 data center

10:47 fabrics and you need to have something

10:49 um really extensible enough as well as

10:52 programmable enough as well as something

10:54 that you can query in a very efficient

10:57 way because the source of truth is

10:58 something that you know you query all

10:59 the time so you ask questions to that

11:01 source of truth so there's a lot of

11:03 requirements uh with regards to the uh

11:06 implementation that that

11:08 um that you uh

11:11 you uh you have there

11:13 so one

11:14 the uh the choice that we made in in um

11:17 in in abstract is that um

11:20 for for this product is that we selected

11:22 the graph database right so graph

11:23 databases are our type of databases out

11:26 there uh they have been around for quite

11:29 some time and it's one of the really

11:31 powerful and metropolitan job that when

11:32 it comes to databases uh for for the

11:35 history i mean it has been used

11:37 initially for you know with social

11:38 networks and so on so it's uh it's

11:41 something that really comes from the

11:42 twitters and facebooks of this world

11:45 and uh it's used in a variety of domain

11:48 to to

11:50 for a modeling um um you know um

11:54 practice right um

11:56 so

11:58 we

11:59 uh in astra have been strong believers

12:01 that to model an intent

12:04 uh grad databases are very suited so for

12:06 example when we talk about young data

12:09 modeling and so on these are very well

12:11 suited to have device level modelling so

12:13 it's quite you know lower in the stacks

12:16 but when it comes to having a system

12:18 modeling or an architecture modeling um

12:21 gravitables are very well suited so um

12:24 and i'm going to elaborate on some of

12:26 the reasons there so

12:28 the first one is that uh graphdbs are

12:31 very um efficient to uh to model highly

12:34 connected data domains right

12:36 you instead of storing the data in

12:38 tables in tabular formats you're storing

12:40 them in nodes and then graph dbs

12:44 nodes are interconnected with

12:45 relationships and relationships are as

12:48 important as the data in the nodes

12:50 itself so so the way data is

12:52 interconnected is as important as the

12:53 data itself

12:55 and the

12:57 data center list by network is a highly

12:59 highly connected domain right it's it's

13:01 just image network between some spines

13:03 responds in super spines uh it's also

13:06 quite complex from a from a um

13:10 services running on top especially if we

13:11 talk about evpns the

13:14 the the various layers you have there

13:16 from from underlay to overlay are are

13:19 quite complex so you need to have a way

13:22 to capture this this uh this highly

13:24 connected uh domain and rdbs are super

13:27 well suited for that the other uh uh

13:30 reason is that it's highly extensible

13:31 right it's it's kind of schema-less i

13:33 mean it has a schema per se but that

13:35 schema is super extensible

13:37 as as as the requirements evolved you

13:39 can add new node types and new node

13:41 relationships and and and extend in any

13:44 direction you want so um so it's it's um

13:47 it's uh you know extensible or even

13:49 scalable um from the perspective to cope

13:52 with new network requirements and the

13:55 last one which is very important is that

13:57 it exposes very sophisticated query

13:59 languages right so um um it's it's

14:01 something that allows you to uh ask very

14:04 complex multi-dimensional questions um

14:07 in in a very efficient way much more

14:08 than any sql um you know a query

14:11 language if you are running a

14:12 traditional rdms or dvms database right

14:15 and those three elements are are pillars

14:18 that

14:19 that that we will rely on and i'm gonna

14:22 i'm gonna elaborate on on why so

14:25 anytime you will use aptra whether by

14:27 the ui or apr or anything you're

14:29 expressing you express an intent you

14:30 design a template to design an uh i

14:33 don't know where any any and you'll

14:34 express any policy from a from an even

14:37 perspective or writing policy

14:39 that using intent is stored in the graph

14:41 db's and you see here you know notes and

14:43 relationship and so on and this black db

14:45 will serve as the source of truth

14:47 anything will come out of that being

14:49 configuration rendering or expectation

14:51 rendering as well as cabling map

14:53 anything is an artifact of this graph

14:55 even

14:57 so um if you um

15:00 you look at uh you know the typical ui

15:02 uh in in the product you will design a

15:04 very very small list by network you have

15:06 two spines four leads here a couple of

15:08 servers in every rack

15:10 uh ios would basically uh see um uh your

15:13 intent in this form right with with a

15:16 couple of nodes and relationship and so

15:18 on

15:19 so uh um um

15:21 you still interact with with you know

15:23 your your your you know your traditional

15:26 ui to basically express your intent but

15:28 under the hood we we capture that by

15:31 adding nodes in a relationship and so on

15:33 to understand

15:34 your intent and overlaying physical as

15:36 well as logical uh aspect to that now if

15:39 you scale that to uh you know we have we

15:42 have major telcos that are uh using the

15:44 product with uh you know um

15:47 a few hundreds of of of of of uh many

15:50 hundreds of devices per dc's and

15:53 with you know data center interconnect

15:55 and so on so you're looking at

15:56 potentially hundreds of thousands of

15:58 nodes um from a physical standpoint and

16:01 and thousands of you know big exam

16:03 services and so on so this translates

16:05 into something that will basically be um

16:08 you know presented on the right um our

16:10 largest customers basically uh have have

16:13 something between 20 and 30 million

16:15 relationships in the database and and

16:17 modernly you know more than a million

16:20 evpn routes and so on

16:22 and that

16:23 is how ios is the network so from there

16:27 uh we can then do a number of things so

16:29 we don't

16:31 need to uh um like traditional

16:34 automation solutions

16:36 store uh

16:38 things that are basically a result of

16:40 the intent so we don't store device

16:42 configurations we don't store uh cabling

16:45 maps we don't store typical

16:47 artifacts like this all of those are

16:49 result of computing the graph and we

16:51 leverage

16:53 lightweight processes that are that can

16:55 be parallelized to uh

16:58 you know run those tasks and and render

17:01 configurations as well as run their

17:02 telemetry expectation as well as collect

17:05 telemetry as well as compare those

17:06 telemetry in real time um so we model

17:09 really the bare minimum and the graph

17:11 database and we let the system derive

17:14 from that everything that is required

17:17 and

17:18 why this is important is because then

17:21 they you know the tool itself will have

17:23 a complete understanding of the network

17:25 so if i take this small list by network

17:28 we have seen before

17:29 uh the graph representation of it will

17:32 more or less look like this i would have

17:33 like two spines and a couple of leaves

17:35 so you see the physical representations

17:38 uh and and you see all the logical

17:40 representations typically evpn links and

17:42 services here so you see nodes uh

17:45 uh virtual networks and and and then

17:48 instances of those virtual networks

17:50 connected to specific leaves

17:53 and then um

17:54 eventually svis if you're enabling

17:57 enabling vxlan routing

17:59 and so on and so forth so um

18:02 typically uh this is how ios will will

18:05 will

18:06 capture the the user intent

18:08 and and from here

18:10 um we will have a number of steps that

18:13 will happen the first one is what we

18:15 call preconditions check meaning uh

18:18 the system will will check the user

18:20 input for any new service that is

18:22 expressed and then once you pass those

18:24 validation we will basically enable post

18:27 conditions check which is deriving the

18:29 expectations and making sure that

18:31 whatever configuration or state we have

18:33 we haven't forced on the switches is

18:35 actually being met so um

18:38 i'm going to take examples of the

18:39 preconditions check um which is an

18:41 important part of the automation as well

18:43 and then the post conditions check which

18:45 uh comes into the

18:46 uh falls into the the validation

18:48 category

18:49 um so the preconditions check is anytime

18:52 you are in a run phase so you have a

18:54 running infrastructure and you are

18:55 requesting new services right you're

18:57 requesting new um

18:59 i don't know a new a new browsing policy

19:02 a new acl policy uh any anything that

19:06 you you express in the system so this is

19:08 a screen capture that shows that you

19:10 know you are staging uh a new a new

19:12 intent and and aos will will will

19:15 perform a number of

19:17 semantic validation meaning it will look

19:19 at the user input and and make sure that

19:22 it is

19:23 complete and exhaustive in terms of user

19:26 data as well as the fact that it's not

19:28 conflicting with any existing um

19:33 data or policy or whatever so the idea

19:36 here is that because you have a system

19:38 modeling of the user intent uh the the

19:41 tool is able to make sure that uh at no

19:44 point in time the operator is is is

19:47 triggering an automation workflow with

19:49 data that will break at the end right so

19:51 what happens here is that if there is

19:53 any data that is missing or that if it's

19:55 provided but in the wrong format or

19:57 provided with the right format but

19:59 conflicting with the previous policy or

20:01 anything that um is is deemed to be uh

20:04 incorrect you would have what's called

20:06 build errors so these are preconditions

20:08 check a typical example is you know

20:10 duplicate ips uh so as a human issue you

20:14 even if you use an automation tool you

20:15 can create

20:16 two services in the same tenant and have

20:18 the same uh the two services have the

20:21 same eyepiece by mistake and the tool is

20:23 here to prevent you from doing that

20:24 preventing from doing that is stopping

20:27 any commit possibility so you do not see

20:29 any configuration

20:31 being sent to the switches if there is

20:33 enabler we just uh deep you know gray

20:36 the commit button and stop the api

20:37 equivalent endpoint right so

20:40 um and that's important for us right to

20:42 basically make sure that

20:44 there is an in-depth uh verification of

20:46 the user input um

20:48 and this is you know leveraging all the

20:50 system the system validation so this was

20:53 pre-condition check um when you

20:56 typically stage a new service or create

20:59 or basically do any ad moves and changes

21:01 to your to your intent um so making sure

21:05 you are not basically uh going to uh

21:07 make a change that will break the

21:09 network you know again it's really

21:10 really dearly seeing those changes and

21:12 then let's say you have passed this this

21:14 user input validation and

21:16 whatever user request you have you have

21:18 put into the system is is deemed to be

21:20 to be uh feasible so the system lets you

21:22 commit it at any point in time when

21:24 you're ready to commit

21:25 so the configuration rendering will be

21:27 generated and pushed to the new devices

21:29 incremental configuration will be

21:30 handled and so on

21:32 and then uh you will have also the

21:35 expectation rendering in the tool that

21:37 will

21:38 generate expectations and compare those

21:40 expectations to the actual data in in

21:43 the infrastructure

21:44 so i'm taking here a typical example of

21:46 something that is um you know quite

21:48 symptomatic from from today's challenges

21:49 which is

21:51 evpn right so when you configuring vpn

21:54 um which is one of the complex protocols

21:56 today um you have a lot of

22:00 states a lot of frauds and and for the

22:02 first time you have

22:04 you have more service routes than than

22:06 actual customer rights right so one

22:09 you know if you can wrap types 3 and 5

22:11 for example which is in in this example

22:13 dashboard here

22:15 these are infrastructure routes and and

22:18 you need them to be present on the

22:20 switches um right after you configured

22:23 any any layer to layer through vxlan uh

22:26 irrespective of any user traffic or any

22:28 customer basically consuming those those

22:30 uh those services um the consumption of

22:33 those european services will trigger

22:36 additional rust but these are really

22:37 infrastructure

22:39 and here you have something that in a

22:41 contextual manner anytime you request i

22:44 don't know 50 new vxlan or 2000 new gxl

22:48 and services it will automatically

22:50 derive the expected routes it is

22:52 supposed to see in every switch in a

22:54 very detailed way so you know 15 here

22:56 online here is you know those are

22:58 counters that are related to very small

23:00 infrastructure and showing in this case

23:02 the deviation the fact that three of the

23:04 expected routes are missing and and

23:07 three here of the expected lot as well

23:09 in the elite i5 are are missing on the

23:12 um

23:13 on the switches right um so

23:16 if you want to do this manually you will

23:19 typically type show command so these are

23:21 show commands from from junos to

23:24 look at the

23:25 uh bgp vpn routing table and requesting

23:28 uh like show me the top three routes in

23:30 two minutes by five routes right

23:32 what we see here is that first of all

23:33 the user input is not obvious these are

23:36 the

23:37 the data output is not obvious right

23:39 because

23:40 this is mp bgp we have we have

23:44 we have uh

23:45 we have to deal with drug distinguishers

23:47 we would route targets um

23:49 the interpretation of the data is not

23:51 obvious and moreover if

23:54 if something is missing it's it's quite

23:56 hard or sometimes almost impossible for

23:58 a human

24:00 to cope with

24:01 so

24:02 you are typically looking to use show

24:05 commands to troubleshoot something that

24:08 you know at scale

24:09 is um very often almost impossible to do

24:12 from from your perspective and scale

24:14 here i mean the moment you accept 10

24:16 racks

24:16 with a with few hundreds of of v lines

24:19 um stretch you you reach that scale so

24:22 um you're typically like okay i'm typing

24:24 your commands and looks good to me but

24:26 uh if something is missing you you you

24:28 rarely uh are able to to see it and and

24:32 what we do here is we have something

24:34 that automates

24:35 those expectations and those checks and

24:37 the level of checks we do on on on you

24:40 know on day one like when you stand up

24:43 the fabric and and

24:44 and deploy your first services is

24:46 exactly the same as the number of checks

24:49 that we do let's say in six months down

24:52 the road after you know having uh uh

24:54 added um you know 50 racks or 100 racks

24:57 and and i don't know how many uh virtual

24:59 services the uh uh the objective here is

25:03 to have something that makes the same

25:05 checks consistently uh

25:07 in a continuous manner uh with the

25:09 purpose of of having no technical depth

25:12 whatsoever right um you you don't have

25:16 any configuration drift because uh there

25:18 is no incremental configuration that is

25:20 pushed to the switches if there is no

25:23 automated expectation that is generated

25:25 and automatic data collection from from

25:27 the switches to to compare what and to

25:29 automate the comparison of that data to

25:31 the expectations

25:33 so that's for us

25:34 um you know um an important um you know

25:37 aspect of how we want to de-risk day two

25:40 changes and and make change management

25:43 in in in um

25:45 in today's operations um being something

25:47 that is you know not scaring every every

25:49 every teams or every operations teams

25:52 um and um yeah so typically here the

25:54 typical user will do uh we'll look at

25:56 the output and try to see what what do i

25:59 have here

26:00 am i having the right v tabs for the

26:02 right dni am i missing something what

26:05 entry i'm expecting to see for a

26:07 specific vtep and do i have the right

26:09 processing resource driven

26:12 approach it's kind of impossible to

26:14 derive this expectation in real time you

26:16 cannot use the configurations to derive

26:18 these expectations it's a very hard

26:19 problem to solve in software uh

26:21 configurations have to be a result the

26:24 same way as validation have to be a

26:26 result of a more logical um

26:29 you know definition of your intent

26:32 so um moving on and to illustrate what

26:35 this will look like uh after you have

26:38 been looking at the back end

26:39 implementation so you see things like

26:41 this so this is an example of the beach

26:43 appearance you see on a typical switch

26:46 uh a number of expectations and you've

26:48 seen expected versus an actual colon

26:50 and and of course any any you know

26:52 deviation here is marked in red

26:54 and uh this is updated in real time as

26:56 you add drugs or services uh and you see

26:59 this contextual monitoring and

27:02 continuous validation uh you don't have

27:04 to worry about you know which ip address

27:06 is used for any which spine and so on we

27:08 leverage all the data modeling and which

27:10 model to present you with with uh you

27:12 know the

27:13 information to uh uh pinpoint and

27:15 understand what's what's being wrong in

27:17 which part of the infrastructure so this

27:19 is an example on the bgp routing on on

27:21 the routing tables you have other

27:23 examples like this i have shown before

27:25 the evp

27:26 or the overlay routes this is

27:28 looking for the underlay routing do i

27:30 have the right uh uh

27:32 um

27:34 entries in my routing tables on every

27:37 switch yes or no i have an expected

27:39 versus an actual comparison um and at no

27:42 point in time um i am you know bothered

27:45 with meaningful uh or meaningless um

27:48 alarms i have only contextual alarms

27:51 anything that is red is contextual to an

27:53 expectation right and then you have

27:55 other type of validation of telemetry

27:57 that is not necessarily deterministics

27:59 is more like traffic oriented so show me

28:02 the path between any two end points in

28:04 the fabric i pick one server in one rack

28:06 and another server in another rack

28:08 potentially in different parts and i

28:09 want to see how the the traffic is

28:11 flowing or how how the fabric is

28:13 behaving between those two end points so

28:15 i have here you know examples of of a

28:17 three-stage

28:19 topology and showing the path between

28:22 two specific endpoints

28:24 so again this is really traffic oriented

28:27 let's say i have also the right routing

28:30 entries and the right configurations i

28:31 can uh inter you know create the system

28:34 to check for any unbalanced situation

28:36 and there again uh this is irrespective

28:38 of the right or wrong configuration you

28:40 may have the correct configuration of

28:42 this wheel on the switches but still

28:44 have an unbalanced situation um which is

28:47 you know symptomatic from any elephant

28:49 flows that can

28:50 arise in in your network so you also

28:53 have dashboards that look for for acne

28:55 and balance both layer 3 cmp inside the

28:58 fabric or layer 2 cmp which is uh

29:00 between your servers and your on your on

29:02 your on your leaves uh and you know this

29:05 is more it's a bit more difficult to

29:07 generate alarms here i mean or to create

29:09 an unbalanced situation but you get the

29:11 idea of the um of how the gorgeous will

29:14 will uh evolve sure there is any any

29:16 invalid situations and the important

29:17 here is that you can define your own uh

29:20 provide your own definition of what an

29:21 unbalance is um

29:24 so um this this um analytics pipeline

29:27 that is behind this this dashboard here

29:30 has has has uh the ability to uh

29:34 gives you the ability to customize the

29:36 the definition of an imbalance

29:39 by by saying the amount of standard

29:41 deviation that you tolerate between

29:44 links uh as well as as as the

29:47 observation intervals so that uh based

29:50 on your traffic pattern you can you want

29:52 to be elected if if if imbalance uh is

29:56 is matching a given condition which is

29:58 probably different from one customer two

29:59 to another so all of this is highly

30:01 customizable um and then other views of

30:04 the of the um you know leveraging the um

30:06 the data model is that again you can

30:08 interrogate the source of truth very

30:10 easily and you can create dashboards

30:12 like this to say i want to see the notes

30:14 out versus east-west traffic um

30:16 distribution right and from design time

30:19 we have we have modeled what the

30:21 external link is and what what the

30:22 internal what the fabric link is and

30:24 what the server facing link is and so on

30:26 so it's requiring the source of truth

30:29 and creating a dashboard out of this

30:31 is extremely easy and more importantly

30:33 is that when you create a dashboard like

30:35 that it's in sync with your internet

30:37 meaning the moment i add external links

30:39 for whatever reasons because i need more

30:42 you know more more traffic to uh towards

30:44 my mp speeds or my one and whatever

30:47 this dashboard is automatically aware of

30:49 any any any modification so it

30:51 automatically collects data related to

30:53 new interfaces because it's aware of a

30:56 change that will impact the data

30:57 collection here so there is no more

30:59 disconnect between someone managing the

31:01 configuration and someone managing a

31:03 monitoring stack and the need to update

31:05 the two there is one and only one single

31:07 source of truth serving any any any any

31:11 single feature in this product being

31:13 configuration or validation right

31:15 um

31:16 so the data model leverages this in in

31:19 um in real time and then

31:22 carrying on like what what else can i do

31:24 with with the graph modeling of my

31:25 source of truth well what i can do is

31:27 that i can i can do this between

31:29 different versions of the graphs and

31:32 this here is meant you know the same way

31:34 as as it gets diff you know if you will

31:38 if you look at an analogy in the in the

31:41 uh in industrial control right so uh um

31:44 i can the system can very easily compute

31:47 diffs between between an an existing

31:50 version of the of the graph and the

31:52 previous one right and

31:55 what this allows me to do is

31:57 to really version control my network and

32:00 um so you know journals have had this

32:03 amazing uh rollback uh feature uh years

32:06 ago which is you know uh certainly

32:09 i mean something that every every every

32:11 network operator has been amazed with

32:13 and and we just

32:15 use the same concept but make it system

32:17 wide

32:17 meaning i can i can at any point in time

32:20 roll back an entire file break

32:23 with you know

32:24 two three clicks away i can i can come

32:27 back to a previous revision and that can

32:30 change configuration on one switch 10

32:31 switches or 100 switches at a time um so

32:35 we uh we store you know the latest

32:37 revisions like the five latest revisions

32:39 and we allow you to store 25 uh other

32:42 ones for for

32:44 for for life if you want to store them

32:46 for forever and roll back to them

32:48 so uh imagine the power of the rollback

32:50 on on a specific device and making it

32:53 system wide

32:54 knowing that we manage very large uh uh

32:57 um data center uh leaves one

32:59 architecture for for customers um so um

33:03 and yeah this feature is called intent

33:04 time via jurors so uh uh which is

33:06 basically uh what we just mentioned uh

33:08 and the uh

33:11 the you know the bottom line is is is is

33:13 that again we want to manage

33:16 the infrastructure as a system and not

33:18 as as individual boxes which is uh in

33:22 our view the only way that you were to

33:24 reach you know this this agility uh even

33:26 though under the hood you get access to

33:28 configuration underneath every switch

33:30 but you manage it as as a system

33:33 and then moving on uh this this graph

33:35 database uh that that that we um we

33:39 mentioned is also something that we can

33:42 use to uh ingest as well uh data from

33:45 external systems so we have to report

33:48 integrations

33:49 and uh one example of that instead of

33:50 the integrations with with vmware tools

33:53 so vsphere and nsx and so on

33:55 um

33:56 and um

33:58 we we um we can then uh create

34:01 additional nodes in the relationship in

34:03 this source of truth to model data

34:06 coming from

34:07 another source of truth but for which we

34:10 you know we want to be aware of

34:12 to validate our domain

34:15 so uh i'm gonna explain this through

34:17 through an example we have this

34:19 integration with with with with nsx um

34:22 and

34:24 what it allows us to do is that we will

34:27 obtain through read-only api through 360

34:30 manager we will get the identity of all

34:32 the vms all the transport nodes all the

34:35 uh

34:36 as well as the opening profiles of every

34:38 vm

34:40 and the micro segmentation policies and

34:43 and you know you name it so a number of

34:46 of information that are outside of our

34:48 domain but for which we want to be aware

34:50 of because it allows us to then

34:52 correlate that with with the with the

34:54 underlying right so we then have the

34:56 ability to have to know uh what vms we

34:59 have running in the fabric where they

35:01 are located like what esxi is hosting

35:04 them and which

35:06 top of rack is connected to to that esxi

35:09 and what are the user intent from the

35:12 vmware administrator and do i have the

35:16 right uh configuration in the fabric or

35:18 the right user intent in the fabric to

35:19 satisfy that right

35:21 so that we can bridge the gap between

35:23 between the two domains and um

35:25 we uh once we uh enhance the graph

35:28 database with this information we have

35:29 automatic validation pipelines that pops

35:31 up and lets you know whether there is a

35:33 discrepancy between one domain and

35:35 another uh and typical example is i i do

35:39 have like a

35:40 requirement for a vlan

35:42 backed interface from from vmware and

35:44 this one is missing from the user intent

35:46 so the network admin has just not been

35:48 aware of being notified that he has to

35:50 create a new service in ios but with

35:52 that he'd be aware that he'll soon get a

35:55 probably phone call whatever from the

35:56 vmware guys to say hey i need this

35:59 uh vlan here or i need you to jdm to you

36:01 whatever so we empower the network

36:03 administrators with knowledge outside of

36:05 the network domain and in some cases

36:08 exposing autonomous radiation workflows

36:10 right which is

36:11 a way that atra has to let the user know

36:15 that hey you have like i don't know

36:17 fertilizer normally is here

36:19 click this button and i'm gonna do the

36:21 add moves and changes to your intent

36:23 automatically for you

36:25 uh to change the underlay to make it in

36:27 sync with the with the uh with the

36:29 overlay right um we we do not make any

36:32 modification on on on vmware part there

36:34 is a management tool for that but we can

36:36 take the report from there and

36:38 automatically make the underlay

36:40 fabric uh apart being in sync with those

36:43 requirements and the goal is to really

36:45 speed up the

36:47 speed of the uh

36:48 the deployments and avoid being um you

36:51 know you know a button like um when it

36:54 comes to underlay overlay validation

36:58 um and all those examples any any

37:01 dashboards you have seen any dashboard

37:04 is under the hood highly customizable

37:07 so um

37:09 this is the anatomy of of of

37:12 an analytics pipeline in in ios which is

37:14 behind any any user dashboard

37:16 irrespective of how this looks

37:19 whether it's a gauge or you know whether

37:21 it's you know tabular or or histograms

37:23 or whatever

37:24 so an

37:26 analytic pipeline is composed of those

37:28 three components each one of them is

37:29 customizable so you have the telemetry

37:31 collectors which is an sdk

37:34 we

37:34 give customers to uh and

37:37 enrich the the data set or the the data

37:40 that's collected from the switches um

37:43 you know go beyond the built-in

37:45 telemetry collectors that we have so

37:47 this is really the ability for you to

37:48 extend what's called the road data

37:50 collection like anything that comes out

37:52 of the switch you find a data structure

37:54 and how you want to stream this data

37:57 into the telemetry framework so this

37:59 part is as i said customizable and then

38:02 once data comes to the aos server you

38:04 can then create a user define a highly

38:06 customizable pipeline in blue

38:09 um

38:10 where you can select various processors

38:12 the processor makes a specific uh you

38:14 know

38:16 manipulation or correlation or reduction

38:18 or processing of this data so the

38:20 selection of those processes and the way

38:22 you change them together defines an

38:24 operational workflow and and there is

38:26 like you know list of different

38:28 processors and the use cases are

38:31 really wide because you can really

38:33 combine them in very different ways to

38:34 basically i don't know compare the data

38:37 time series it

38:38 do periodical range and even more

38:41 complex calculations right and then

38:44 you see the red part in the top so all

38:46 of this this pipeline is always in sync

38:48 with your source of truth so you you

38:50 define a query to the source of truth

38:52 which defines the scope of application

38:54 of this pipeline

38:55 and then it's a zero maintenance uh you

38:58 know feature like any time

39:01 you add move and change anything from

39:03 physical logical standpoint the fabric

39:05 you don't have to notify or change or

39:07 maintain this this analytics pipeline

39:09 you have automatic uh notification

39:12 and increasing or reducing the data set

39:15 to collect new data because it's subject

39:17 to analysis here again singles also true

39:19 for configuration and rendering and uh

39:22 and validation right

39:23 um and then last point and then i think

39:26 i would probably be

39:28 almost on time

39:30 last point is that

39:32 we um

39:35 create automatically apis

39:38 for

39:39 uh

39:40 any pipeline that that that you you

39:43 create right of course we have apis for

39:45 all the configuration part right that

39:46 that's uh certainly um you know uh

39:49 mandatory uh but uh any any user

39:51 pipeline any analytics powerpoint that

39:53 you create for any use case

39:55 you have api endpoint that allows you

39:59 to uh get

40:00 to any stage meaning to any processor

40:04 which is which is um you know available

40:07 for you to have third-party systems

40:09 created the data being raw which is on

40:11 the far left or being processed in the

40:13 middle

40:14 or being totally analyzed which is

40:17 closer to the far right

40:19 where you can extract more insight out

40:21 of it

40:22 so that's one method to obtain the data

40:24 and we also have the ability to uh

40:27 stream this data to external data lakes

40:29 um

40:30 with um

40:32 google protocol buffers so um

40:35 so all data being collected that the the

40:37 ultra server is aggregated normalized

40:40 and so on and enrich with metadata as

40:43 well so that you have insisted

40:44 experience and then we stream out the

40:46 data from the ultra server two-way to am

40:50 a google protocol but for um television

40:52 collectors so you know we typically have

40:54 plugins for telegraph because we we like

40:56 this modular tune and customers you know

40:58 can then um

41:00 you know select any time series that are

41:01 best to to write data back um so you can

41:05 basically have practice to um use apis

41:07 to program the extra server to collect

41:09 or parse new data in specific formats

41:12 you are interested with and you have

41:14 them this ultra server that can

41:16 stream the data uh um to this uh you

41:20 know data like external stack that that

41:23 you have out there uh so on these are

41:25 just examples of of very common uh tick

41:27 stack or or else stack that that we we

41:30 interface with

41:31 um

41:33 so um to recap

41:36 we have um

41:38 a solution that allows to

41:40 configure and operate um

41:43 with uh very powerful um and

41:47 and um

41:49 but powerful customizable analytics

41:51 engines so powerful in terms of

41:53 it's it's automatically available for

41:55 you um

41:56 and uh and and it's of course highly

41:58 extensible to to fit any customization

42:02 so which allows us to address you know

42:03 day zero day one but day two which is

42:05 the major challenge in network

42:07 operations it's a more uh difficult

42:09 problem to solve

42:10 um and um and of course allowing you of

42:13 course to enforce any compliance policy

42:17 uh that you want on top of of um

42:20 of the built-in validations that that we

42:22 have

42:24 um if you are interested we um have um

42:28 different resources so um

42:30 there's a youtube channel

42:32 where we uh explain in uh

42:35 five to ten minutes different uh part of

42:37 the product from from the design to the

42:39 validations and so on so uh i encourage

42:42 you uh

42:43 to to to have a look if you want to get

42:46 uh get a glimpse of how the product um

42:49 is is uh is operating if you want to

42:51 spend more time we have ultra academy

42:53 which is a self-service training uh

42:57 tool so it's uh it's um

43:00 something that is a full day training

43:02 but you can you can basically do it at

43:04 your own pace but it's it's uh it's

43:06 worth of it's three days worth of of

43:09 instructor-led training um with with the

43:13 various modules broken down with some

43:15 form of deep dive explanation there

43:18 um and and you have the virtual labs

43:20 that you can use to um stand up virtual

43:23 environment uh we will basically uh

43:26 create small topologies generally two by

43:28 four or something but really enough for

43:31 you to uh

43:33 uh at least appreciate the major uh part

43:35 of the or the major features uh we

43:37 generally send up this with virtual

43:39 switches like literal qfx's or virtual

43:41 order losses but the user experience

43:43 from from the actual perspective is

43:45 exactly the same

43:46 whether it's a physical switch or a

43:48 virtual compactor

43:54 [Music]

43:58 you

Show more