# Whiteboard Technical Series: Decision Trees

Learning Bytes AI & ML

## Decision trees: Why they’re so effective at troubleshooting network issues.

In this third installment of the Juniper Whiteboard Technical Series, you'll see how the Juniper/Mist AI platform uses Decision Tree models to identify common networking problems.

## You’ll learn

• How Decision Trees pinpoint common network issues, like faulty cables, access point and switch problems, plus wireless coverage

• How to isolate a cable fault by creating a Decision Tree using features of a bad cable, such a frame errors and one-way traffic

• How the structure of a Decision Tree accomplishes the ultimate goal: to produce an accurate model with a high level of certainty

## Transcript

0:11 today we'll be looking at decision trees

0:13 and how they identify common network

0:15 issues like faulty network cables

0:17 ap or switch health and wireless

0:19 coverage

0:20 the algorithms used include simple

0:23 random forest

0:24 gradient boosting and xg boost

0:27 our example will be a simple decision

0:29 tree algorithm

0:31 the essential steps in any machine

0:33 learning project involve collecting data

0:36 training a model then deploying that

0:38 model

0:39 decision trees are no different let's

0:41 use a real world example

0:44 in networking decision trees can be

0:46 applied to incomplete negotiation

0:48 detection

0:49 mtu mismatch among others our example

0:52 today applies the fundamental problem of

0:54 determining cable faults

0:56 a cable is neither 100 good nor bad

0:59 but somewhere in between to isolate a

1:02 fault we can create a decision tree

1:04 using features of a bad cable such as

1:06 frame errors and one-way traffic

1:09 we begin a decision tree by asking a

1:12 question which will have the greatest

1:13 impact on label separation

1:16 we use two metrics to determine that

1:18 question genie impurity

1:20 and information gain genie impurity

1:23 which is similar to entropy

1:25 determines how much uncertainty there is

1:27 in a node and information gain lets us

1:29 calculate

1:30 how much a question reduces that

1:32 uncertainty

1:33 a decision tree is based on a data set

1:35 from known results

1:37 each row is an example the first two

1:40 columns are features that describe the

1:42 label in the final column

1:43 a good or bad cable the data set can be

1:48 and the structure will remain the same a

1:52 decision tree starts with a single root

1:53 node which is given this full training

1:55 set

1:57 the node will then ask a yes no question

1:59 about one of the features

2:00 and split into two subsets of data which

2:02 is now the input of a child node

2:05 if the labels are still mixed good and

2:08 then another yes no question will be

2:12 the goal of a decision tree is to sort

2:13 the labels until a high level of

2:15 certainty is reached without overfitting

2:17 a tree to the training data

2:19 larger trees may be more accurate and

2:21 tightly fit the training data

2:23 but once in production may be inaccurate

2:25 in predicting real events

2:27 we use metrics to ask the best questions

2:29 at each point until there are no more

2:32 then prune branches starting at the

2:34 leaves to address overfitting the tree

2:37 this produces an accurate model with

2:39 leaves illustrating the final prediction

2:42 genie impurity is a metric that ranges

2:44 from zero to one where lower values

2:46 indicate

2:47 less uncertainty or mixing at a node it

2:50 gives us our chance of being incorrect

2:53 let's look at a mail carrier as an

2:55 example in a town with only one person

2:58 and one letter to deliver

2:59 the genie impurity would be equal to

3:01 zero since there's no chance of being

3:03 incorrect

3:04 however if there are 10 people in the

3:07 town with one letter for each

3:09 the impurity would be 0.9 because

3:12 now there's a 1 in 10 chance of placing

3:15 the randomly selected mail into the

3:16 right mailbox

3:19 information gain helps us find the

3:20 question that reduces our uncertainty

3:22 the most

3:24 it's just a number that tells us how

3:25 much a question helps to unmix the

3:27 labels at a node

3:29 we begin by calculating the uncertainty

3:31 of our starting set

3:33 impurity equals 0.48 then for each

3:37 question we segment the data

3:38 and calculate the uncertainty of the

3:40 child nodes

3:41 we take a weighted average of their

3:43 uncertainty because we care more about a

3:45 large set

3:46 with low uncertainty than a small set

3:48 with high uncertainty

3:49 then we subtract that from our starting

3:51 uncertainty and that's our information

3:53 gain

3:54 as we continue we'll keep track of the

3:56 question with the most gain

3:57 and that will be the best one to ask at

3:59 this node

4:01 now we're ready to build the tree we

4:03 start at the root node of the tree which

4:05 is given the entire data set

4:07 then we need the best question to ask at

4:09 this node

4:10 we find this by iterating over each of

4:13 these values

4:14 we split the data and calculate the

4:16 information gain for each one

4:18 as we go we keep track of the question

4:20 that produces the most gain

4:22 found it once we find the most useful

4:24 question to ask we split the data using

4:27 this question

4:28 we then add a child node to this branch

4:30 and it uses the subset of data after the

4:32 split

4:33 again we calculate the information gain

4:36 to discover the best question for this

4:37 data

4:39 rinse and repeat until there are no

4:42 and this node becomes a leaf which

4:44 provides the final prediction

4:47 we hope this gives you insight into the

4:48 ai we've created

4:50 if you'd like to see more from a