Help Center User GuideGetting StartedFAQRelease Notes
 
X
User Guide
Getting Started
FAQ
Release Notes
Contents  

HealthBot Machine Learning (ML)

HealthBot Machine Learning Overview

HealthBot uses machine learning to detect anomalies and outliers, and predict future device or network-level behavior. The machine learning-enabled HealthBot features include:

Anomaly DetectionAnomaly detection using the HealthBot Anomaly Detection algorithms involves comparison of new data points with data points collected from the same device during a specific learning period. HealthBot supports the following machine learning algorithms for anomaly detection:
  • 3-sigma

  • K-means

  • Holt-Winters

Anomaly detection can be activated within HealthBot rules by setting a rule field’s ingest type to formula, and then selecting anomaly detection. (Configuration > Rules > Fields tab > Ingest Type > Formula).

Outlier DetectionOutlier detection using the HealthBot Outlier Detection algorithms involves analysis of data from a collection of devices across your network during a specific learning period. HealthBot supports the following machine learning algorithms for outlier detection:
  • Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

  • K-fold cross-validation using 3-sigma (k-fold/3-sigma)

PredictionPrediction of future device or network-level behavior involves using the HealthBot median prediction machine learning algorithm or, the Holt-Winters prediction algorithm.

Starting with HealthBot Release 3.1.0, you can choose the Holt-Winters prediction algorithm from Configuration > Rules > Fields > Ingest Type > Formula.

Understanding HealthBot Anomaly Detection

This section describes the input parameters associated with HealthBot rules configured to detect anomalies using Anomaly Detection algorithms. Once the machine learning models are built, they can be used in production to classify new data points as normal or abnormal. The accuracy of the results increases with larger amounts of data.

Field

To apply a machine learning algorithm, you must first define the numeric data field on which to apply the algorithm. For information on how to create a user-defined data field for a HealthBot rule, see the Fields section in the HealthBot User Guide.

Algorithm

The HealthBot Anomaly Detection algorithms include Holt-Winters, 3-sigma and k-means:

Holt-WintersThe Holt-Winters algorithm uses traffic entropy measurements and seasonal variations in traffic to detect anomalous traffic flowing through an interface. The seasonality aspect provides a means to de-emphasize normal increases and decreases in traffic that happen regularly over time intervals. For example, network traffic in an enterprise network could be expected to have a weekly seasonality since there would be significantly more traffic on the network during the work week than on the weekend.

Since Holt-Winters can predict a traffic drop starting on Friday evening, an anomaly might be triggered if traffic actually increased on Friday evening.

3-SigmaThe 3-sigma algorithm classifies a new data point as normal if it’s within 3 standard deviations from the mean (average across all the data points in the data set). A new data point is classified as abnormal if it’s outside this range.
K-meansThe HealthBot k-means algorithm uses k-means clustering and other building blocks to create a machine learning model for classifying new data points as normal or abnormal:
  • K-means clustering splits n data points into k groups (called clusters), where k ≤ n. For HealthBot, k is set to 5 buckets.

  • For forming the clusters, a 32-dimensional vector is considered for each point, thus taking into account the trend (not just the current point, but its past 31 historical values).

  • Each cluster has a center called the centroid. A cluster centroid is the mean of a cluster (average across all the data points in the cluster).

  • All new data points are added to a cluster, however, if a data point is considered too far away from the centroid, it is classified as abnormal.

Learning period

The learning period specifies the time range to collect data from which the algorithm uses to build the machine learning models. Supported units of time for learning period include: seconds, minutes, hours, days, weeks, and years. You must enter the plural form of the time unit with no space between the number and the unit. For example, a learning period of one hour must be entered as 1hours.

HealthBot builds machine learning models daily starting at midnight. For example, if the learning period is 5 days and triggered on 11th Feb 2019 00:00, then data collected from 6th Feb 2019 00:00 to 11th Feb 2019 00:00 is used by HealthBot to build the machine learning models. For the Holt-Winters prediction algorithm, the learning period must be at least twice the pattern periodicity to ensure there is enough of a pattern to learn.

Pattern periodicity

The pattern periodicity specifies the buckets for which data should be collected and used to build machine learning models. Each bucket of data represents a user-defined time period and a specific pattern of data. A minimum number of data points is required for a machine learning algorithm to build a model:

Supported units of time for pattern periodicity include: minutes, hours, days, weeks, and months. You must enter the plural form of the time unit with no space between the number and the unit.

For example:

Understanding HealthBot Outlier Detection

This section describes the input parameters associated with HealthBot rules used for outlier detection algorithms. Once the machine learning models are built, they can be used in production to identify time series data sets as outliers. The accuracy of the results increases with larger amounts of data.

The results of the HealthBot outlier detection algorithm are stored in a table in the times series database. Each row in the table contains outlier detection output and metadata that is associated with a particular time series. You can use the information in the table to configure HealthBot rule triggers. Each column in the table is identified by a unique name that starts with the user-defined outlier detection field name. For example, you can use the field-name-is-outlier and field-name-device column names to configure a trigger that detects outliers and produces a message that indicates which specific device was determined to be the outlier. For more information, see the “Triggers” section of the HealthBot Outlier Detection Example.

Dataset

For the outlier detection formula, input data is specified as a list of XPATHs from a variable. For information on how to create a user-defined variable for a HealthBot rule, see the Variables section in the Contrail HealthBot User Guide.

The following is an example of a list of XPATHs:

/device-group[device-group-name=DG0]/device[device-name=D0]/topic[topic-name=T0]/ rule[rule-name=R0]/field[re=RE[01] AND hostname=1.1.1.*]/re-memory,/ device-group[device-group-name=DG0]/device[device-name=D1]/topic[topic-name=T0]/ rule[rule-name=R0]/field[re=RE[01] AND hostname=1.1.1.*]/re-memory

This path list specifies that on devices D0 and D1 in device-group DG0, get re-memory from topic T0 rule R0, where the RE is either RE0 or RE1 and the hostname is in the 1.1.1.* block. This path allows for selecting data at the field-key level, which is necessary because different field keys may have different purposes.

For example:

Algorithm

The outlier detection algorithms include k-fold, 3-sigma, and dbscan:

K-Fold Cross-Validation Using 3-SigmaK-fold cross-validation using the 3-sigma (k-fold 3-sigma) algorithm is used to create a machine learning model for identifying outliers. K-fold cross-validation splits the entire data set into k groups and uses the following general process to create the machine learning models:
  • Each unique k group is used as a test data set.

  • The k groups (that are not being used as a test data set) are used as the training data set to build a machine learning model.

  • Each test data set is evaluated using its associated machine learning model.

  • The test data set with the most outliers relative to their associated machine learning model is classified as an outlier.

For example, if k is the number of devices in a device group and the group has 4 devices, then k=4. For cross-validation, four machine learning models are built and the test data sets are evaluated as follows:

  • Model 1: Trained with the data points collected from device 1, device 2, and device 3, then tested with the data points collected from device 4.

  • Model 2: Trained with the data points collected from device 1, device 2, and device 4, then tested with the data points collected from device 3.

  • Model 3: Trained with the data points collected from device 1, device 3, and device 4, then tested with the data points collected from device 2.

  • Model 4: Trained with the data points collected from device 2, device 3, and device 4, then tested with the data points collected from device 1.

Using the k-fold 3-sigma algorithm is more applicable if it’s known that outliers will skew in one direction or another. If there are outliers on both sides of normal data points, or there are enough outlier data points to make the algorithm believe that nothing is outlying, then the k-fold 3-sigma algorithm will not provide significant results.

DBSCANDensity-Based Spatial Clustering of Applications with Noise (DBSCAN) is an unsupervised machine learning algorithm used to create a machine learning model for identifying time series data sets as outliers:
  • Time series data sets are grouped in such a way that data points in the same cluster are more similar to each other than those in other clusters.

  • Clusters are dense regions of time series data sets, separated by regions of lower density.

  • If a time series data set belongs to a cluster, it should be near many other time series data sets in that cluster.

  • Time series data sets in lower density regions are classified as outliers.

Using the DBSCAN algorithm is more applicable if outliers appear inside the 3-sigma threshold of the other data points. DBSCAN can find outlying behavior that doesn’t appear as a significant deviation from the normal behavior at any given time step.

Sigma coefficient (k-fold-3sigma only)

The sigma coefficient is a thresholding argument (default value is 3). The thresholding argument determines, at each point in time for a series, how far away a value must be from the other values to be marked as an outlier.

Sensitivity

Sensitivity is used to calculate the outliers, m, that the algorithm seeks to find in the data. Sensitivity determines the number of time series test data sets to return as outliers (the top m are returned):

Learning period

See the Learning period description of the “Understanding HealthBot Anomaly Detection” section.

Understanding HealthBot Predict

This section describes the input parameters associated with HealthBot rules used for forecasting future values with the HealthBot median prediction machine learning algorithm or the Holt-Winters prediction machine learning algorithm. Once the machine learning models are built, they can be used in production to predict trends and forecast future values. The accuracy of the results increases with larger amounts of data.

Field

See the Field description of the “Understanding HealthBot Anomaly Detection” section.

Algorithm

The HealthBot Predict feature uses either the median prediction algorithm, or the Holt-Winters prediction algorithm.

The median value represents the midpoint for a range of values within a data sampling. For every pattern periodicity bucket, a median is calculated from the data samples available in the bucket.

Learning period

See the Learning period description of the “Understanding HealthBot Anomaly Detection” section.

Pattern periodicity

See the Pattern periodicity description of the “Understanding HealthBot Anomaly Detection” section. For the median prediction algorithm, we recommend a minimum number of 10 data points for building a machine learning model. For the Holt-Winters algorithm, the pattern periodicity should be half or less of the learning period.

Prediction offset

The prediction offset value is a time in the future at which you want to predict a field value. For example, if the present time is 6th Feb 2019 10:00 and the prediction offset is set to 5 hours, then HealthBot will predict a field value for 6th Feb 2019 15:00.

Supported units of time for prediction offset include: seconds, minutes, hours, days, weeks, and years. You must enter the plural form of the time unit with no space between the number and the unit. For example, a prediction offset of one hour must be entered as 1hours.

HealthBot Rule Examples

The machine learning HealthBot rules described in this section are available for upload from the HealthBot Rules and Playbooks GitHub repository.

HealthBot Anomaly Detection Example

This example describes how the check-icmp-statistics Healthbot device rule is configured to send ICMP probes to user-defined destination hosts to detect anomalies when round trip average response time is above static or dynamic thresholds.

The following sections show how to configure the applicable input parameters for each HealthBot rule definition block (such as, Fields, Variables, and Triggers) using the HealthBot GUI. For more information about how to configure HealthBot rules, see Creating a New Rule using the HealthBot GUI.

Sensors (check-icmp-statistics)

Figure 9 shows the general properties and iAgent sensor configured for the check-icmp-statistics rule. For information about the count-var and host-var variables, see Variables (check-icmp-statistics).

Figure 9: General properties (check-icmp-statistics) and Sensors definition (icmp)

General properties (check-icmp-statistics)
and Sensors definition (icmp)

Fields (check-icmp-statistics)

The following fields are configured for the check-icmp-statistics rule:

dt-response-time(See Figure 10) Configuration for anomaly detection using the k-means algorithm. When an anomaly is detected, HealthBot returns a value of 1.
rtt-average-ms(See Figure 11) Round trip average response time.
rtt-threshold(See Figure 12) Static threshold for the round trip average response time. The rtt-threshold variable populates this field.

Figure 10: Fields definition (dt-response-time)

Fields definition (dt-response-time)

Figure 11: Fields definition (rtt-average-ms)

Fields definition (rtt-average-ms)

Figure 12: Fields definition (rtt-threshold)

Fields definition (rtt-threshold)

Variables (check-icmp-statistics)

The following three variables are configured for the check-icmp-statistics rule:

count-var(See Figure 13) ICMP ping count. Count is set to 1 by default.
host-var(See Figure 14) Destination IP address or host name where ICMP probes are periodically sent.
rtt-threshold-var(See Figure 15) Static threshold value for round trip average response time. Threshold value is 1 ms by default. This variable populates the rtt-threshold field.

Figure 13: Variables definition (count-var)

Variables definition (count-var)

Figure 14: Variables definition (host-var)

Variables definition (host-var)

Figure 15: Variables definition (rtt-threshold-var)

Variables definition (rtt-threshold-var)

Functions (check-icmp-statistics)

Figure 16 shows the function configured for the check-icmp-statistics rule. This function converts the unit of measure for the round trip average response time from microseconds to milliseconds.

Figure 16: Functions definition (micro-milli)

Functions definition (micro-milli)

Triggers (check-icmp-statistics)

The following triggers and terms are configured for the check-icmp-statistics rule:

Figure 17: Triggers definition (packet-loss)

Triggers definition (packet-loss)

Figure 18: Terms definition (is-device-not-reachable)

Terms definition (is-device-not-reachable)

Figure 19: Terms definition (is-device-up)

Terms definition (is-device-up)

Figure 20: Terms definition (no-packet-loss)

Terms definition (no-packet-loss)

Figure 21: Triggers definition (round-trip-time)

Triggers definition (round-trip-time)

Figure 22: Terms definition (is-rtt-fine)

Terms definition (is-rtt-fine)

Figure 23: Terms definition (is-rtt-medium)

Terms definition (is-rtt-medium)

Figure 24: Terms definition (rtt-normal)

Terms definition (rtt-normal)

Rule Properties (check-icmp-statistics)

Figure 25 shows the rule properties configured for the check-icmp-statistics rule.

Figure 25: Rule Properties definition (check-icmp-statistics)

Rule Properties definition
(check-icmp-statistics)

HealthBot Outlier Detection Example

This example describes how the check-outlier Healthbot network rule is configured to detect outliers across the devices in a device group using the round trip time average response time.

The following sections show how to configure the applicable input parameters for each HealthBot rule definition block (such as, Fields, Variables, and Triggers) using the HealthBot GUI. For more information about how to configure a HealthBot rule, see Creating a New Rule using the HealthBot GUI.

Sensors (check-outlier)

Figure 26 shows the general properties configured for the check-outlier rule. Note that this rule is a network rule.

Figure 26: General properties (check-outlier)

General properties (check-outlier)

Fields (check-outlier)

Figure 27 shows the field configured for the check-outlier rule. This field defines the DBSCAN algorithm and rtt-xpath variable for outlier detection. For information about the rtt-xpath variable, see Variables (check-outlier).

The results of the HealthBot outlier detection algorithm are stored in a table in the times series database. Each row in the table contains outlier detection output and metadata that is associated with a particular time series. You can use the information in the table to configure HealthBot rule triggers. Each column in the table is identified by a unique name that starts with the user-defined outlier detection field name. For example, you can use the field-name-is-outlier (rtt-ol-is-outlier) and field-name-device (rtt-ol-device) column names to configure a trigger that detects outliers and produces a message that indicates which specific device was determined to be the outlier (see Triggers (check-outlier).).

Figure 27: Fields definition (rtt-ol)

Fields definition (rtt-ol)

Variables (check-outlier)

Figure 28 shows the variable configured for the check-outlier rule. This variable defines the devices in the network from which HealthBot collects round trip average response time data for the outlier detection machine learning models.

Figure 28: Variables definition (rtt-xpath)

Variables definition (rtt-xpath)

Triggers (check-outlier)

Figure 29 shows the trigger configured for the check-outlier rule. The following terms are configured for the icmp-outlier-detection trigger:

is-outlier-detected(see Figure 30) When an outlier is detected, HealthBot returns a value of 1 for the rtt-ol-is-outlier field, and the HealthBot health status severity level is set to major (red). This term also produces a message that indicates which specific device was determined to be the outlier.
no-outlier(See Figure 31) Otherwise, HealthBot returns a value of 0, and the severity level is set to normal (green).

Figure 29: Triggers definition (icmp-outlier-detection)

Triggers definition (icmp-outlier-detection)

Figure 30: Terms definition (is-outlier-detected)

Terms definition (is-outlier-detected)

Figure 31: Terms definition (no-outlier)

Terms definition (no-outlier)

Rule Properties (check-outlier)

Figure 32 shows the rule properties configured for the check-outlier rule.

Figure 32: Rule Properties definition (check-outlier)

Rule Properties definition
(check-outlier)
Help us to improve. Rate this article.
Feedback Received. Thank You!

Ask questions in TechWiki

Check documentation in TechLibrary

Rating by you:      
X

Additional Comments

800 characters remaining

May we contact you if necessary?

Name:
Email:

Need product assistance? Contact Juniper Support

Submit