Predictions Overview
This page describes how the Predictive Analytics feature notifies administrators of potential issues in the network using machine learning models to predict future behaviors and outliers.
Predictions Overview (Beta)
Predictive Analytics uses machine learning (ML) techniques to forecast possible outliers in resource utilization patterns or equipment failures. This feature helps the administrator to proactively address issues and prevent possible outages.
The Apstra Edge device receives data about the network devices through the configured probes on Apstra. The Edge device sends this data to DC Assurance where the ML models perform the following tasks:
-
Aggregate data points received from the network
-
Train the ML models using these data points and learn normal behavior to create a baseline
-
Forecast future data points by using trained ML models on historical data
-
Identify deviations in the forecasted data by using the outlier detection model trained on historical metric data
The Predictive Analytics feature identifies outliers in the following metrics:
-
System Health
-
Optical Interface Health
System Health
The Predictive Analytics feature identifies outliers in the system CPU utilization and system memory utilization metrics. It uses the data collected by the Stream Receivers probe on Apstra to learn and forecast system health metrics and identify deviations. This feature allows you to proactively identify issues in the network infrastructure and resolve them before they impact network traffic.
You can run the pre-flight check for the Edge device and verify whether the stream receivers are configured correctly. See Pre-Flight Checks for more information.
Predictive Analytics uses the following ML algorithms to predict System Health outliers:
|
Function |
Machine Learning Algorithm |
Description |
|---|---|---|
|
Forecasting |
Light Gradient Boost Machine (LGBM) |
An ensemble method based on decision trees that combines multiple weak models to produce a single strong prediction. Gradient boosting builds the model sequentially, with each model focusing on errors made by the previous one. |
|
Outlier Detection |
Isolation Forest (iForest) |
A tree-based method that builds an ensemble of random trees for outlier detection. Isolation tree-based methods recursively select random variables and random split values for these variables as tree nodes at each step to create the subtrees and eventually isolate the outliers at tree leaves. |
Predictive Analytics uses ML algorithms to analyze the data it receives from the network to predict network behavior. Once this feature has been enabled, it requires a certain amount of data points to train its model on expected behavior and to recognize outliers. During the initial period after enabling this feature, the predictions about the system health in the network might not be available or fully accurate. This is because the received data is insufficient for the ML algorithm to analyze, and make meaningful predictions. The accuracy of the predictions improve over time as more data is available for analysis.
Optical Interface Health
-
Predictive Analytics for Optical Interface Health is a technology preview feature. For more information on technology previews, see Juniper Apstra Tech Previews in the Apstra Data Center Director User Guide.
-
Predictive Analytics for Optical Interface Health requires your Apstra Data Center Director instance to be running Apstra version 6.1 or higher.
Optical cables offer lower latency and better energy efficiency over long distances, which is crucial for maintaining performance across large-scale clusters. Optical interfaces are the ports that link the optical cables to devices for data transmission. An optical interface consists of multiple optical lanes that act as independent data transmission channels. Each lane carries a portion of the total data stream, and multiple lanes are aggregated to achieve higher overall bandwidth.
Identifying potential issues in the optical interfaces or lanes early enables network administrators to take corrective actions in advance, thereby preventing link outages and ensuring continuous network reliability.
The Predictive Analytics feature identifies outliers in the optical interfaces and lanes in the network. It uses the Digital Optical Monitoring (DOM) metrics data collected by the Optical Transceivers probe on Apstra to forecast optical health metrics and identify deviations from expected behavior.
To ensure that the Predictive Analytics feature can learn expected behavior and predict optical link health metrics, verify that the Optical Transceivers probe has been configured correctly on Apstra. For more information, see Optical Transceivers in the Apstra Data Center Director User Guide.
Predictive Analytics uses the following ML algorithm to predict Optical Interface Health outliers:
| Function | Machine Learning Algorithm | Description |
|---|---|---|
|
Forecasting |
Multivariate Bidirectional Long Short-Term Memory (BiLSTM) |
Bidirectional LSTM is a method that processes the input sequence in forward and backward direction, allowing the model to capture both past and future context of the input sequence. Multivariate BiLSTM processes and analyses data containing multiple related parameters simultaneously. |
The Predictive Analytics feature uses DOM metrics data to predict optical metric trends and flag outliers indicative of link or lane degradation.Table 3 lists the DOM metrics and the signs that indicate degrading optical link:
|
DOM Metrics |
Indicators of Link Degradation |
|---|---|
|
Received optical power (RX power) |
Decreasing values |
|
Transmitted optical power (TX power) |
Decreasing erratic values |
|
Laser bias current |
Increasing or erratic values |
|
Voltage |
Fluctuating or out of range values |
|
Temperature |
Increasing values or erratic spikes |
Predictive Analytics for optical link health uses a statistical approach to detect outliers in the forecasted data. It marks outliers based on two factors:
-
Outlier Detection by Trend
If the forecasted values of the RX power show a downward trend, the bias current and temperature show an upward trend for a specified number of consecutive days, an outlier will be raised.
-
Outlier Detection based on Threshold
If the forecasted values of the optical health parameters cross the high alarm or low alarm thresholds as configured by the vendors for a specified duration, an outlier will be raised.
View Predicted Impacts
To view the predictive outliers in the network, navigate to Assurance > Predictions. Use the site drop-down to select a specific site. You can also click the System Health or Optical Link Health cards to view outliers of the selected type.
The Predictions tab displays a list of outliers, the affected device, the severity level of the outlier, the predicted time when the event might occur. It also displays the number of impacted clients and services in the selected site.
The Predictive Analytics feature forecasts system health outliers that might occur in a 24-hour period and optical interface health outliers that might occur in a 15-day period.
You can click the clients and services button to view and search from the full list of impacted clients.
You can also use the Predictive Search option on the top right of the page to search for a specific service and view its historical, current, and predicted metrics.
If no predictive outliers are displayed, it could be because of the following reasons:
-
The network devices and optical links are behaving as expected.
-
The Predictive Analytics feature is newly configured and not enough data is available yet to make reliable predictions.
-
The streaming receivers and optical transceiver probes are not sending data from the Edge device to DC Assurance. In this case, verify whether the probes are configured correctly and that streaming is enabled on these probes on Apstra.
If the list of impacted clients and services for a predicted outlier is not displayed, it could be because:
-
No traffic is flowing through the affected devices in the network.
-
DC Assurance is not receiving traffic information from the Edge device. In this case, run the pre-flight check for the Edge device and verify if the flow servers are configured correctly. See Pre-Flight Checks for more information.
When you select an outlier on the Predictions tab and click View topology, the network topology is displayed showing the device and the services impacted by the predicted outlier.
View Topology and Device Data
The network topology displays the impacted services and clients for the selected outlier along with the traffic flow between the services and the network devices as shown in Figure 3.
When you select a device from the topology, the right pane displays options to view more details about the predicted anomalies. Select between Optical or System Health categories to view the required metrics of the device.
For Optical Interface Health, select the optical interface and lane from the drop-down as well as the parameter to be displayed. The graph on the right pane displays the historical, current, and predicted values for the selected parameter. For System Health, the graph displays the historical, current, and predicted CPU and memory usage data for the selected device.
For both categories, click the Now button to scroll to the current metrics for the device on the plotted graph. You can also choose from the given time intervals for the data to be aggregated and displayed in the graph. For System Health, you can have the graph display data aggregated in 1-min or 1-hour intervals. For Optical Interface Health, you can have the graph display data aggregated in 1-hour, 6-hour, 12-hour, or 1-day intervals.
If an outlier is detected in the predicted data points, it is highlighted in purple on the graph. Hover over the outlier to view details such as the type of outlier, the severity, the predicted start and end time of the outlier, and so on.
Benefits of Predictions
The Predictive Analytics feature provides the following benefits:
-
Provides an early warning to administrators about possible failures.
-
Helps understand the impact of predicted outliers.
-
Enables administrators to proactively prevent potential outages.