Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Switch Health SLE

Use the Switch Health SLE to assess switch performance and to identify user-impacting issues with switch reachability, memory, CPU, and more.

Switch Health is one of the Service-Level Expectations (SLEs) that you can track on the Wired SLEs dashboard.

Switch Health SLE Example
Note:

To find the Wired SLEs dashboard, select Monitor > Service Levels from the left menu of the Juniper Mist™ portal, and then select the Wired button.

What Does the Switch Health SLE Measure?

Juniper Mist™ monitors your switches' operating temperature, power consumption, CPU, and memory usage. Monitoring switch health is crucial because issues such as high CPU usage can directly impact connected clients. For instance, if CPU utilization spikes to 100 percent, the connected APs may lose connectivity, affecting the clients' experience.

Classifiers

When the Switch Health threshold is not met, Juniper Mist sorts the issues into classifiers. The classifiers appear on the right side of the SLE block. In this example, 82 percent of the issues are attributed to Switch Unreachable and 12 percent to System. (See the classifier descriptions below the example.)

Switch Health SLE Example
  • Switch Unreachable—This classifier indicates an issue with switch-to-cloud connectivity. The switch might be down or the connection might be severed. By clicking this classifier to go to the Root Cause Analysis page, you can see which switches and chassis are affected.

  • Capacity—This SLE is triggered when usage exceeds 80 percent. This level of usage can indicate that the switch is dealing with more requests that it can optimally handle.

    Sub-Classifiers:

    • ARP Table—Usage exceeded 90 percent of the Address Resolution Protocol (ARP) table capacity.

    • Route Table—Usage exceeded 90 percent of the routing table capacity.

    • MAC Table—Usage exceeded 90 percent of the MAC table capacity.

  • Network—This classifier shows user minutes when the throughput is lower than expected due to uplink capacity limitations. It identifies these issues based on the round-trip time (RTT) value of packets sent from the switch to the Mist cloud. By clicking this classifier to go to the Root Cause Analysis page, you can see how this issue is distributed across all switches and troubleshoot.

    Sub-Classifiers:

    • WAN Latency—Displays user minutes affected by latency. The latency value is calculated based on the average value of RTT over a period of time.

    • WAN Jitter—Displays user minutes affected by jitter. The jitter value is calculated by comparing the standard deviation of RTT within a small period (last 5 or 10 minutes) with the overall deviation of RTT over a longer period (day or week). You can view this information for a particular switch or site.

  • System—These issues indicate issues on the switch that can impact user experiences. By clicking this classifier to go to the Root Cause Analysis page, you can check the distribution across switches and see which switches have a high failure rate requiring immediate attention.

    • CPU—The CPU usage of the switch is above 90 percent.

    • Memory—The memory utilization is above 80 percent.

    • Temp—The operating temperature of the switch is outside the prescribed threshold range, going either above the maximum limit or below the minimum requirement.

    • Power—The switch is consuming over 90 percent of the available power.

Root Cause Analysis for the Switch Health SLE

After you click a classifier in the SLE block, you'll see to the Root Cause Analysis page. Click classifiers and sub-classifiers to view timeline and scope information in the lower half of the screen.

Note:

The information in the lower half of the screen depends on what you've selected at the top.

Useful tabs in the lower half of the screen are:

  • Timeline—See exactly when the issues occurred.

  • Distribution—See which VLANs were affected.

  • Affected Items—See which interfaces and clients were affected and how much each one contributed to the overall impact. Also see the individual failure rate for each interface or client.

Let's look at an example involving Switch Health (SLE) and Switch Unreachable (classifier). After you click the classifier and the Distribution tab, the lower part of the screen shows the affected Switches and Chassis. For each item, you can see the overall impact and the failure rate.

Switch Health - Switch Unreachable - Distribution
Tip:
  • Overall Impact is the percentage that a client or interface contributed to all issues for the selected sub-classifier. For example, it can show if a client account for 20 percent or 90 percent of the issues.

  • Failure Rate is the impact of this issue on this interface or client. For example, it can show if an interface was unsuccessful on 20 percent or 90 percent of connection attempts.

  • To see more details, click the hyperlinks in the table to go to the Insights page, where you can see all client and switch events.