Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Router Throughput SLE

Throughput is the speed at which a data packet can move from one node to another on a network. A high throughput value indicates that data is being routed rapidly and effectively.

Many factors, such as MTU mismatch, cable issues and so on impact a router's throughput. Juniper Routing Assurance continuously monitors these factors, and when they cross a predefined threshold, the router throughput SLE records failure minutes. Failure minutes is the duration in minutes during which the throughput was degraded.

Juniper Routing Assurance then performs root cause analysis and identifies the specific classifiers that caused throughput degradation. The Router Throughput SLE provides visualizations of these classifiers and enables administrators to assess the routing efficiency and the network's overall performance.

To access the Router Throughput SLE page, click Monitor > Service Levels > Routing > Router Throughput.

What Does the Router Throughput SLE Measure?

The Router Throughput SLE measures the percentage of time when the network throughput was optimal. The Router Throughput SLE is a measure of the network's ability to transmit and to receive traffic without impedance.

Classifiers

A classifier is a parameter that indicates whether a router is performing optimally or not. When the network's throughput success threshold is not met, Juniper Routing Assurance collects and classifies the factors contributing to failures, into classifiers (also referred to as health indicators). The router throughput SLE monitors the following classifiers:

  • Congestion─Monitors the impact of congestion on throughput. This classifier provides information about the duration of time that throughput is affected by congestion in the network.

    Congestion in the network occurs because of various reasons and some packets must be dropped to avoid congestion and to facilitate easy flow of traffic in the network. Tail drop profile is one such congestion management mechanism that allows the router to drop arriving packets when the output queue buffers become full or begin to overflow. Random early detection (RED) drop profiles is another congestion management mechanism that prevents buffer overflow. RED drop profiles help manage network congestion by dropping packets before queues become full. In Junos OS, a RED drop profile is defined by two important components:

    • Queue fullness─Represents the percentage of the queue that is currently occupied by packets. It ranges from 0 percent (empty queue) to 100 percent (full queue).

    • Drop Probability─The likelihood that a packet will be dropped when the queue reaches a certain level of fullness. It ranges from 0 percent (no packets dropped) to 100 percent (all packets dropped).

    You can control congestion by configuring RED drop profiles, if the device supports assured forwarding. RED drop profiles use drop probabilities for different levels of buffer fullness to determine which scheduling queue on the device is likely to drop assured forwarding packets under congested conditions. The device can drop packets when the queue buffer utilization exceeds the configured percentage.

    Randomly dropped packets are counted as RED-dropped, while packets dropped for other reasons are counted as tail-dropped.

    You can configure the scheduling priority of individual queues by specifying the priority in a scheduler, and then associating the scheduler with a queue by using a scheduler map. High priority queues receive preferential treatment over all other queues and utilize all of their configured bandwidth before other queues are serviced. Thus, failure minutes for congestion classifier are calculated by analyzing only the high priority queues. If all the queues of an interface have the same priority, the queue with the most number of drops becomes the most impacted queue. Juniper Routing Assurance determines the SLE score by considering the most impacted queue.

    Note: ACX platforms that support Junos OS Evolved have zero tail drops. So, RED-dropped packets are considered for calculating the SLE score on these platforms. For other platforms, tail drops are considered instead.

    Juniper Routing Assurance highlights the most impacted queue in the Root Cause Analysis section. You can use the insights in the Root Cause Analysis section to decide whether a queue needs optimization. Optimizing output queue properties can significantly reduce the number of drops in a queue. Output queue properties include the amount of interface bandwidth assigned to the queue and the size of the memory buffer allocated for storing packets. You can configure the output queue properties from the router's CLI.

    Note: By default, all queues are low priority queues.
  • Interface Anomalies─Monitors minutes when the throughput is affected by errors at the router's interface. The Interface Anomalies classifier has three sub-classifiers that help you identify issues:

    • MTU Mismatch─Displays minutes affected by Maximum Transmission Unit (MTU) errors and input errors on the router interfaces. MTU errors in interfaces occur when a packet size exceeds the maximum allowed size for a specific interface. MTU mismatch errors occur due to oversized data packets or incorrectly configured interface settings.

      You must ensure that the MTU value is consistent along the packet's path to avoid MTU mismatch errors. MTU mismatch will result in discarded or fragmented packets. In Juniper Networks routers, you can check for MTU mismatches in the MTU Errors and Input Errors sections in the output of the following CLI command:

    • Cable Issues─Displays minutes affected by faulty cables in the network.

    • Negotiation Failed─Displays minutes affected by failure of auto-negotiation. Auto-negotiation allows two devices to automatically agree upon communication parameters like speed and duplex communication. If auto-negotiation is not complete, it could mean:

      • One or both routers might have auto-negotiation disabled in their configuration settings.

      • The Ethernet cable between the routers might be defective, preventing proper auto-negotiation.

      • There might be a misconfiguration on either router that is causing problems with auto-negotiation (for example, incorrect speed or duplex settings).

      Failure of auto-negotiation and duplex conflicts can lead to latency on ports. Older models of routers might fail to achieve maximum speed and could operate at a lower link speed. This sub-classifier displays failure minutes that are caused by these issues.

In Figure 1, the Router Throughput SLE met the service level goal for more than 99 percent of the time. Interface Anomalies contributed to degraded router throughput 94 percent of the time. Congestion contributed to degraded router throughput 6 percent of the time.

Figure 1: Router Throughput SLE Router Throughput SLE

Analyze Router Throughput SLE Score

The Root Cause Analysis section provides visualizations for distribution, timeline, and statistics for service level failures and enables administrators to understand the impact of these issues.

Click the View Insights button to navigate to the Router Insights page which gives you fine-grained details of the router events. You can use the Router Insights page to correlate router events that could have impacted the Throughput SLE.

Click the Router Throughput widget to navigate to the root cause analysis page. Click a Classifier to view its Sub-Classifiers.

  • Statistics─The Statistics tab displays the success rate of the Throughput SLE metric and the average throughput of a router. Administrators can also view the distribution graph to understand the trend of throughput. You can view the Statistics tab only when you click the Router Throughput widget.

    Figure 2: Root Cause Analysis of Router Throughput SLE Score Root Cause Analysis of Router Throughput SLE Score
  • Timeline─The Timeline graph represents the trend of SLE failure minutes over a period. You can view the timeline graph for the SLE, every classifier and sub-classifier.

    You can move the slider across the graph. As you hover over the graph, a pop-up on the slider displays the failure minutes of every classifier and sub-classifier during the period. Colored vertical bars on the graph indicate various sub-classifiers.

    You can view the legend on the graph to interpret the plotted lines. Drag an area of interest to zoom in to the graph.

    Figure 3: Timeline Timeline
  • Distribution─Use the Distribution tab to analyze service level failures by various attributes such as overall impact, failure rate and anomaly. The distribution tab displays these attributes for all the routers and their interfaces. Click Interfaces to analyze the attributes at an interface level. Click Routers to analyze service level failures by various attributes at router level.

    You can sort the columns in the table by the column header. Click any column header to sort its entries. By default, the attributes are sorted by most anomalous.

    Table 1 describes the fields on the Distribution tab.

    Table 1: Fields on the Distribution Tab

    Fields

    Description

    Name

    Name of the interface.

    Router Name

    Name of the router.

    Overall Impact

    Interfaces tab─Contribution (in percentage) of interfaces's failure minutes to the overall failure minutes of the classifier or the sub-classifier.

    Routers tab─Contribution (in percentage) of router's failure minutes to the overall failure minutes of all routers.

    Failure Rate

    Interfaces tab─Interface failure rate (in percentage).

    Routers tab─Router failure rate (in percentage).

    Anomaly

    Interfaces tab─Correlation metric to compare an interface's individual failure rate with the overall average failure rate.

    Routers tab─Correlation metric to compare an router's individual failure rate with the overall average failure rate.

    The correlation coefficients are:

    • Anomaly > 1─Interface or router has a higher failure rate than the network average.

    • Anomaly < 1─Interface or router is healthier than the network average.

    • Anomaly = 1─Interface or router is performing exactly in line with the network average.

    Figure 4: Distribution > Interfaces Distribution > Interfaces
    Figure 5: Distribution > Routers Distribution > Routers

    For example, in Figure 5, you can see that:

    • Of the total 171.45 minutes that Router A failed to meet the service level goals, congestion led to the failure to meet the service levels for 171.45 minutes. This means that congestion caused an overall impact of 100 percent for Router A.

    • Router A contributed 171.45 minutes of failure during the 30674.316 minutes that it was up and thus has a failure rate of 1 percent.

    • Router A has an anomaly factor of one indicating that it performed exactly in line with the network average.

  • Affected Items─The Affected items tab lists all routers, with their interfaces and output queues that failed to meet the service level goal. From this tab, you can view affected routers' details such as MAC address, model number, failure rate and so on. The affected item tab also displays the count of routers and output queues that failed to meet the service level goal.

    Click Queues to view the specific queues and interfaces that failed to meet the service level goal. Click Routers to view specific routers that failed to meet the service level goal.

    You can sort the columns in the table by the column header. Click any column header to sort its entries. Table 2 describes the fields on the Affected Items tab.

    Click a router to view the Root Cause Analysis section for a specific router. The Root Cause Analysis section now displays the SLE metrics for the router. Click the View Insights tab to navigate to the Router Insights page. Use the Router Charts, the Router Interface Queues, and the BGP summary information of the router to debug the issues further.

    Table 2: Fields on Affected Items Tab

    Fields

    Description

    Name

    Name of the Router.

    Queue

    Number of the CoS queue (0 through 7).

    Interfaces

    List of interfaces that failed to meet the service level goal.

    MAC

    MAC address of the router.

    Overall Impact

    Routers tab─Contribution (in percentage) of router's failure minutes to the overall failure minutes of all routers.

    Queues tab─Contribution (in percentage) of the output queue's failure minutes to the overall failure minutes of all the queues.

    Note: The Queues tab displays data only for the Congestion classifier.

    Failure Rate

    Routers tab─Router failure rate (in percentage).

    Queues tab─Queue failure rate (in percentage).

    Note: The Queues tab displays data only for the Congestion classifier.

    Model

    Router model name.

    Version

    Version of Junos OS or Junos OS Evolved running on the router.

    Figure 6: Affected Items > Routers Affected Items > Routers
    Figure 7: Affected Items > Queues Affected Items > Queues

    For example, in Figure 7, you can see that:

    • CoS queues 0 and 3 on the interfaces et-0/0/2, et-0/0/0 and, ae0 of Router A have the most number of tail drops.

    • Queue 0 of Router A contributed 62 percent of failure minutes to the overall failure minutes of all the queues.

    • Queue 3 of Router A contributed 38 percent of failure minutes to the overall failure minutes of all the queues.