Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Wired Service-Level Expectations (SLEs)

Juniper MistTM cloud continuously collects network telemetry data and uses machine learning to analyze the end-user experience. You can access this information through the Juniper Mist wired service-level expectation (SLE) dashboards, which help you assess the network's user experience and resolve any issues proactively. The wired SLE dashboards show the user experience of the wired clients on your network at any given point in time. You can use these interactive dashboards to measure and manage your network proactively by identifying any user pain points before they become too big of an issue.

For a quick overview of Juniper Mist wired SLEs, watch the following video:

View SLE Metrics

The wired SLE dashboards display the percentage of time that the SLE metrics met the specified service-level expectation goal within a specific time range. These metrics are categorized into classifiers and sub-classifiers, which provide additional details to identify the specific causes of failure. With this information, you can easily identify and address the issues affecting the end-user experience.

Mist Wired SLEs provide the following metrics to help you assess the end-user experience on your networks:

  • Throughput

  • Switch Health

  • Successful Connects

To view the SLE metrics on the Wired SLE dashboard, click Monitor > Service Levels, and then select the Wired tab.

Figure 1: SLE Dashboard SLE Dashboard

Each metric has classifiers and sub-classifiers that display information to help you identify failures and narrow down the specific problem area. To view the associated sub-classifiers, simply click a classifier. You'll see a tabbed view that includes:

  • Statistics—Shows the overall success rate for the SLE metric.

  • Timeline—Shows the timeline of the failures. For example, the dashboard can show the bad user minutes caused by issues belonging to a particular classifier over a period of time.

  • Distribution—Shows the percentage of impact across different attributes such as interfaces, switches, VLANs, and clients.

  • Affected items—Shows the specific items that failed to meet the service-level goal. Examples: switches, interfaces, and clients.

Here's an example of a Throughput metric view:

The above image indicates that the network met the throughput requirement only for 38 percentage of the time. And that the users faced throughput issues for the remaining 62 percent of the time. The classifier view shows that 98 percent of the issues that affected the throughput belonged to Interface Anomalies category, while 2 percent of issues were network issues.

To access the classifier view, click a metric (for example, Throughput) and then select a classifier (for example, Interface Anomalies). Here's a sample of the Interface Anomalies metrics view:

Note:

The classifiers do not show any data when the metric shows a success rate of 100 percent.

Throughput

The Throughput metric shows the percentage of the time the wired users could pass traffic without any disruptions. This classifier helps you evaluate your network and determine if it requires higher bandwidth for seamless operation. Several factors can impact network throughput, such as MTU mismatches, faulty cables, and devices negotiating at the wrong speed.

The Throughput SLE has five classifiers:

  • Congestion—This classifier shows how congestion contributed to the low throughput. It counts the number of output drops resulting from congestion. When packets arrive on an interface, they are stored in a buffer. If the buffer becomes full, the device starts dropping packets (TxDrops). The classifier uses a formula that considers the following three ratios to determine if a 'bad user minute' is caused by congestion:

    • TxDrops to TxPackets (Total transmitted bytes dropped to Total packets transmitted)

    • Txbps to Link speed (Total bytes transmitted per second to Link speed)

    • RxSpeed to Link Speed (Total bytes received per second to Link speed)

  • Congestion Uplink—The SLE dashboard shows high congestion uplink when:

    • One of the neighbors is a switch or a router (known through LLDP).

    • The port is an STP root port.

    • The uplink port has a higher number of transmitted and received packets compared to the other ports.

    Congestion can also be caused by aggregated Ethernet links and module ports.

  • Interface Anomalies—This classifier shows how interface anomalies contributed to the low throughput. The SLE dashboard gathers information about interface anomalies from the switches. The interface anomalies classifier is divided into the following sub-classifiers:

    • MTU Mismatch—As an admin, you can set a maximum transmission unit (MTU) value for each interface. The default value for Gigabit Ethernet interfaces is 1514. To support jumbo frames, you need to configure an MTU value of 9216, which is the upper limit for jumbo frames on a routed VLAN interface. It's important to ensure that the MTU value is consistent along the packet's path, as any MTU mismatch will result in discarded or fragmented packets. In Juniper switches, you can check for MTU mismatches in the MTU Errors and Input Errors sections of the show interface extensive command output. Each input error or MTU error contributes to a "bad user minute" under MTU mismatch.

    • Cable Issues—This sub-classifier shows the user minutes affected by faulty cables in the network.

    • Negotiation Failed—Latency on ports can happen due to auto-negotiation failure, duplex conflicts, or user misconfiguration of device settings. Moreover, older devices may not be able to achieve maximum speed and could operate at a slower link speed of 100 Mbps. This sub-classifier identifies and helps mitigate instances of bad user time caused by these issues.

  • Storm Control—Storm control allows the device to monitor traffic levels and drop broadcast, unknown unicast, and multicast packets when they exceed a set threshold or traffic levels. These thresholds are known as storm control levels or storm control bandwidth. By default, the storm control level is set to 80 percent of the combined broadcast, multicast, and unknown unicast traffic on all layer 2 interfaces of Juniper switches. Storm control helps prevent traffic storms, but it can also potentially throttle applications or client devices. This classifier identifies these conditions and helps users proactively mitigate throughput issues.

  • Network—This classifier allows you to monitor user minutes when the throughput is lower than expected due to limitations in uplink capacity. It identifies issues based on the round-trip time (RTT) value of packets sent from the switch to the Mist cloud. The Network classifier has two sub-classifiers that help you locate these issues:

    • Latency—Displays user minutes affected by latency. The latency value is calculated based on the average value of RTT over a period of time.

    • Jitter—Displays user minutes affected by jitter. The jitter value is calculated by comparing the standard deviation of RTT within a small period (last 5 or 10 minutes) with the overall deviation of RTT over a longer period (day or week). You can view this information for a particular switch or site.

Switch Health

Switch health is influenced by several factors, including operating temperature, power consumption, CPU, and memory usage. Monitoring switch health is crucial because issues like high CPU usage can directly impact connected clients. For instance, if CPU utilization spikes to 100 percent, the connected APs may lose connectivity, affecting the clients' experience. The Switch Health metric identifies bad user minutes resulting from the following conditions (listed as classifiers):

  • Switch Unreachable—The switch can't be accessed.

  • Memory—The memory utilization is above 80 percent.

  • CPU—The switch CPU usage is above 90 percent.

  • Temp—The switch operating temperature exceeds the prescribed threshold range, either going above the maximum limit or below the minimum requirement. For information about the operating temperature supported by Juniper switches, refer to the switch hardware guides in Juniper documentation portal.

  • Power—The switch power consumption is above 90 percent of the available power.

Successful Connect

The Successful Connect metric shows if a client successfully connects to the network. It helps assess the impact of connect failures and identify the issues preventing client devices from connecting to the network.

The Successful Connect metric has two classifiers:

  • Authentication—Each time a client authenticates, a client event is generated. These could either be successful events or failure events. This classifier helps you identify issues that caused authentication failures. Here's a list of possible reasons for a dot1x authentication failure:

    • If a single switch port fails to authenticate, it could be due to a user error or misconfigured port.

    • If all switch ports fail to authenticate, it could be because:

      • The switch is not added as a NAS client in the RADIUS server.
      • There's a routing issue between the switch and the RADIUS server.
      • The RADIUS server is down.

    • If all switch ports on all switches fail to authenticate, it could indicate a temporary failure with the RADIUS server at that specific moment.

    • If a specific type of device, such as Windows devices, fails to authenticate, it may suggest an issue related to certifications.

  • DHCP—DHCP snooping enables the switch to examine the DHCP packets and keep track of the IP-MAC address binding in the snooping table. This classifier adds a failure event every time a client connects to a network and fails to reach the ‘bound’ state within a minute.

    Note:

    The SLE dashboard shows DHCP failures only for those switches that have DHCP Snooping configured.