Whiteboard Technical Series: Mutual Information

AI & ML

A screenshot from the video showing a diagram that says, “Y: TIME TO CONNECT. X: NETWORK FEATURE.” Below that it says, “H(Y) H(Y/X) H(X/Y) H(X) MI(X;Y).”

Yes, you DO need mutual information in your network.

Push play to discover how the Juniper Mist™ AI-driven platform uses mutual information to help you understand which network features — such as mobile device type, client OS, or access point — have the most information for predicting failure or success in your service-level expectation (SLE) metrics.

You’ll learn

The definition of mutual information, what it means, and some examples
How mutual information works with SLA metrics

Who is this for?

Network Professionals Business Leaders

Resources

Transcript

0:12 today we're talking about how the

0:14 Juniper missed AI driven platform uses

0:17 mutual information to help you

0:19 understand which network features such

0:21 as mobile device type client OS or

0:24 access point have the most information

0:26 for predicting failure or success in

0:29 your SLE client metrics let's start with

0:32 a definition of mutual information

0:34 mutual information is defined in terms

0:36 of the entropy between random variables

0:39 mathematically the equation for mutual

0:42 information is defined as the entropy of

0:44 random variable X minus the conditional

0:47 entropy of X given Y now what does this

0:50 mean let me give you an example let's

0:53 say Y is one of our random variables

0:55 that we want to predict and represents

0:57 the SLE metric time to connect and it

1:01 can be one of two possible values pass

1:03 or fail next we have another random

1:06 variable X that represents a network

1:08 feature that can have a possible value

1:10 of present or not present an example of

1:13 a network feature can be a device type

1:15 OS type time interval or even a user or

1:19 an AP any possible feature of the

1:21 network can be represented by the random

1:23 variable next we'll look at what we mean

1:25 by entropy for most people when they

1:28 hear the term entropy they think of the

1:30 universe and entropy always increasing

1:33 as the universe tends towards less order

1:35 and more randomness or uncertainty so

1:38 entropy represents the uncertainty of a

1:40 random variable and the classic example

1:42 is a coin toss if I have a fair coin and

1:46 I want to flip that coin the entropy of

1:49 that random variable is going to be

1:51 given by the sum of the probability of X

1:53 I times the log two of the probability

1:56 of X and for that fair coin the

1:59 probability is that 50% will be heads

2:01 plus 50% will be tails and the entropy

2:05 is going to be equal to 1 the maximum

2:07 entropy possible when you have maximum

2:10 uncertainty the random variable will

2:12 have maximum entropy if we take an

2:15 example where we don't have a fair coin

2:16 we have some hustler out there and he's

2:18 using a loaded coin let's say the

2:20 probability of heads is 70% and the

2:22 probability of tails is 30%

2:25 now in this case

2:26 your maximum entropy is going to be 0.88

2:29 so you can see that as the uncertainty

2:32 goes down your entropy will trend toward

2:34 zero if you were at zero entropy that

2:38 would mean no uncertainty and the coin

2:40 flip would always be heads or tails now

2:42 let's go back and see how mutual

2:44 information works with our SLA metrics

2:46 graphically what does this equation look

2:49 like let's say we look at how this

2:51 circle here represents the entropy of my

2:53 SLA metric Y and this circle is the

2:57 entropy of my feature random variable X

2:59 so if you look at our equation the

3:03 conditional entropy of random variable Y

3:05 given the network feature X is this area

3:08 here if I subtract the two what we're

3:11 looking for is this middle segment this

3:14 represents the mutual information of

3:16 these two random variables and it gives

3:19 you an indication of how well your

3:20 network feature provides some

3:21 information about your s le metric

3:23 random variable Y if the network feature

3:26 tells you everything about the SLA

3:28 metric then the mutual information is

3:30 maximum if it tells you nothing about

3:32 the SLA metric then the mutual

3:34 information between x and y is zero now

3:38 mutual information tells you how much

3:41 information a network feature random

3:43 variable Y gives you about the s le

3:45 metric time to connect but it doesn't

3:48 tell you whether the network feature is

3:49 better at predicting failure or success

3:51 of the SLA metric for that we need

3:54 something called the Pearson correlation

3:56 if you look at the picture of the

3:58 correlation it tells us a couple of

4:00 things one is the amount of correlation

4:03 with a range from negative 1 to 1 the

4:06 other is the sign negative and positive

4:09 which is a predictor of pass or fail so

4:13 now we have these two things first is

4:15 the magnitude indicating how correlated

4:18 the two random variables are second is

4:21 the sign which indicates failure or

4:23 success if the correlation is negative

4:26 the network feature is good at

4:28 predicting failure if it's positive it's

4:31 good at predicting pass if the Pearson

4:34 correlation is zero it means there is no

4:36 linear correlation between the variables

4:38 but there could be mutual information

4:40 between the two but the Pearson

4:43 correlation does not tell us the

4:45 importance of the network feature or if

4:47 there's not enough data to make an

4:49 inference between the network feature

4:50 random variable and the SLA metric

4:53 random variable that's given back to our

4:56 graphic of the circles there may be one

4:59 case where I have very high entropy for

5:01 both variables but there may be another

5:03 case where I have much smaller entropy

5:05 on one of those variables both of these

5:07 examples may be highly correlated with a

5:10 high Pearson's value but the entropy of

5:12 mutual information will be much higher

5:14 in the first case which means this

5:17 random variable has much more importance

5:19 in predicting success or failure of a

5:22 feature I hope this gives a little more

5:25 insight into the AI we've created a mist

5:27 and if you look at the MIS dashboard the

5:29 result of this process is demonstrated

5:31 by our virtual assistant

Watch

4:56

A computer-generated whiteboard image showing 10 letter-sized envelops and six mailboxes below.

Whiteboard Technical Series: Mutual Information

Yes, you DO need mutual information in your network.

You’ll learn

Who is this for?

Resources

Transcript

Experience More

Whiteboard Technical Series: Decision Trees

Whiteboard Technical Series: Reinforcement Learning

NaaS Accelerates the Adoption of AI-Driven Networking