Whiteboard Technical Series: Mutual Information

AI & ML
A screenshot from the video showing a diagram that says, “Y: TIME TO CONNECT. X: NETWORK FEATURE.” Below that it says, “H(Y) H(Y/X) H(X/Y) H(X) MI(X;Y).”

Yes, you DO need mutual information in your network.

Push play to discover how the Juniper Mist™ AI-driven platform uses mutual information to help you understand which network features — such as mobile device type, client OS, or access point — have the most information for predicting failure or success in your service-level expectation (SLE) metrics. 

Show more

You’ll learn

  • The definition of mutual information, what it means, and some examples

  • How mutual information works with SLA metrics

Who is this for?

Network Professionals Business Leaders

Transcript

0:12 today we're talking about how the

0:14 Juniper missed AI driven platform uses

0:17 mutual information to help you

0:19 understand which network features such

0:21 as mobile device type client OS or

0:24 access point have the most information

0:26 for predicting failure or success in

0:29 your SLE client metrics let's start with

0:32 a definition of mutual information

0:34 mutual information is defined in terms

0:36 of the entropy between random variables

0:39 mathematically the equation for mutual

0:42 information is defined as the entropy of

0:44 random variable X minus the conditional

0:47 entropy of X given Y now what does this

0:50 mean let me give you an example let's

0:53 say Y is one of our random variables

0:55 that we want to predict and represents

0:57 the SLE metric time to connect and it

1:01 can be one of two possible values pass

1:03 or fail next we have another random

1:06 variable X that represents a network

1:08 feature that can have a possible value

1:10 of present or not present an example of

1:13 a network feature can be a device type

1:15 OS type time interval or even a user or

1:19 an AP any possible feature of the

1:21 network can be represented by the random

1:23 variable next we'll look at what we mean

1:25 by entropy for most people when they

1:28 hear the term entropy they think of the

1:30 universe and entropy always increasing

1:33 as the universe tends towards less order

1:35 and more randomness or uncertainty so

1:38 entropy represents the uncertainty of a

1:40 random variable and the classic example

1:42 is a coin toss if I have a fair coin and

1:46 I want to flip that coin the entropy of

1:49 that random variable is going to be

1:51 given by the sum of the probability of X

1:53 I times the log two of the probability

1:56 of X and for that fair coin the

1:59 probability is that 50% will be heads

2:01 plus 50% will be tails and the entropy

2:05 is going to be equal to 1 the maximum

2:07 entropy possible when you have maximum

2:10 uncertainty the random variable will

2:12 have maximum entropy if we take an

2:15 example where we don't have a fair coin

2:16 we have some hustler out there and he's

2:18 using a loaded coin let's say the

2:20 probability of heads is 70% and the

2:22 probability of tails is 30%

2:25 now in this case

2:26 your maximum entropy is going to be 0.88

2:29 so you can see that as the uncertainty

2:32 goes down your entropy will trend toward

2:34 zero if you were at zero entropy that

2:38 would mean no uncertainty and the coin

2:40 flip would always be heads or tails now

2:42 let's go back and see how mutual

2:44 information works with our SLA metrics

2:46 graphically what does this equation look

2:49 like let's say we look at how this

2:51 circle here represents the entropy of my

2:53 SLA metric Y and this circle is the

2:57 entropy of my feature random variable X

2:59 so if you look at our equation the

3:03 conditional entropy of random variable Y

3:05 given the network feature X is this area

3:08 here if I subtract the two what we're

3:11 looking for is this middle segment this

3:14 represents the mutual information of

3:16 these two random variables and it gives

3:19 you an indication of how well your

3:20 network feature provides some

3:21 information about your s le metric

3:23 random variable Y if the network feature

3:26 tells you everything about the SLA

3:28 metric then the mutual information is

3:30 maximum if it tells you nothing about

3:32 the SLA metric then the mutual

3:34 information between x and y is zero now

3:38 mutual information tells you how much

3:41 information a network feature random

3:43 variable Y gives you about the s le

3:45 metric time to connect but it doesn't

3:48 tell you whether the network feature is

3:49 better at predicting failure or success

3:51 of the SLA metric for that we need

3:54 something called the Pearson correlation

3:56 if you look at the picture of the

3:58 correlation it tells us a couple of

4:00 things one is the amount of correlation

4:03 with a range from negative 1 to 1 the

4:06 other is the sign negative and positive

4:09 which is a predictor of pass or fail so

4:13 now we have these two things first is

4:15 the magnitude indicating how correlated

4:18 the two random variables are second is

4:21 the sign which indicates failure or

4:23 success if the correlation is negative

4:26 the network feature is good at

4:28 predicting failure if it's positive it's

4:31 good at predicting pass if the Pearson

4:34 correlation is zero it means there is no

4:36 linear correlation between the variables

4:38 but there could be mutual information

4:40 between the two but the Pearson

4:43 correlation does not tell us the

4:45 importance of the network feature or if

4:47 there's not enough data to make an

4:49 inference between the network feature

4:50 random variable and the SLA metric

4:53 random variable that's given back to our

4:56 graphic of the circles there may be one

4:59 case where I have very high entropy for

5:01 both variables but there may be another

5:03 case where I have much smaller entropy

5:05 on one of those variables both of these

5:07 examples may be highly correlated with a

5:10 high Pearson's value but the entropy of

5:12 mutual information will be much higher

5:14 in the first case which means this

5:17 random variable has much more importance

5:19 in predicting success or failure of a

5:22 feature I hope this gives a little more

5:25 insight into the AI we've created a mist

5:27 and if you look at the MIS dashboard the

5:29 result of this process is demonstrated

5:31 by our virtual assistant

Show more