How to Troubleshoot Error Conditions
Read the following sections to learn how you can diagnose problems on the router using alarm messages and component indicators.
Here’s Where to Start
You have troubleshooting resources available through Junos OS alarm messages and LED indicators. You can use these links to discover what these alarms and indicators mean when an error occurs.
To troubleshoot a the router, you use the Junos OS CLI, alarms, and LEDs on the network ports, management panel of the Routing Control Board (RCB), and components.
-
LEDs—When the Routing Engine detects an alarm condition, it lights the red or yellow alarm LED on the management panel as appropriate. In addition, you can use component LEDs and network port LEDs to troubleshoot the router.
-
CLI—The CLI is the primary tool used to flow and troubleshoot hardware, Junos OS, routing protocols, and network connectivity. CLI commands display information from routing tables, information specific to routing protocols, and information about network connectivity derived from the ping and traceroute utilities. For information about using the CLI to troubleshoot Junos OS, see the appropriate Junos OS configuration guide.
-
JTAC—If you need assistance during troubleshooting, you can contact the Juniper Networks Technical Assistance Center (JTAC) by using the Web or by telephone. If you encounter software problems, or problems with hardware components not discussed here, contact JTAC.
-
Knowledge Base articles—Knowledge Base.
Alarm Messages Overview
When a Routing Engine detects an alarm condition, it lights the red or yellow alarm
LED on the RCB management panel as appropriate. To view a more detailed description
of the alarm cause, issue the show system alarms
CLI command, which
indicates major and minor alarms on the system.
In this example, a fan tray error occurred in slot 4.
user@host> show system alarms 6 alarms currently active Alarm time Class Description 2020-07-21 09:33:09 PDT Minor PSM 0 PSM MCU AC minimum supported firmware version mismatch 2020-07-21 09:33:09 PDT Minor PSM 1 PSM MCU AC minimum supported firmware version mismatch 2020-07-21 09:33:09 PDT Minor PSM 2 PSM MCU AC minimum supported firmware version mismatch 2020-07-21 09:33:08 PDT Major PSM 0 Input2 Failed 2020-07-21 09:33:09 PDT Major PSM 1 Input2 Failed 2020-07-21 09:33:09 PDT Major PSM 2 Input2 Failed
You can also use the variations of the show system errors
command to
find key information about the error.
-
show system errors active
—Displays current active errors in the system -
show system errors active fpc
—Displays active errors for line cards -
show system errors count
—Displays system-wide errors and current count -
show system errors fru detail
—Displays detailed FRU-specific error -
show system errors fru detail fpc
—Displays information about detected errors based on the FRU
This example shows not only the current errors but also those that are cleared.
user@host> show system errors count Level Occurred Cleared Action-Taken ------------------------------------------- Minor 35 32 39 Major 3 0 6 Fatal 0 0 0
Chassis Alarm Messages
Chassis alarms indicate a failure of the device or one of its components. Chassis alarms are preset and cannot be modified.
Chassis alarms on the router have two severity levels:
-
Major (red)—Indicates a critical situation on the device that has resulted from one of the conditions described in Table 1. A red alarm condition requires immediate action.
-
Minor (yellow or amber)—Indicates a noncritical condition on the device that, if left unchecked, might cause an interruption in service or degradation in performance. A yellow alarm condition requires monitoring or maintenance.
Table 1 describes the chassis alarm messages on the router.
Chassis Component |
Alarm Condition |
Alarm Severity |
Remedy |
---|---|---|---|
Routing Control Board |
An RCB has failed. |
Major (red) |
Replace the failed RCB. |
An RCB has been removed. |
Minor (yellow) |
Install an RCB in the empty slot. |
|
Line cards |
A line card is offline. |
Minor (yellow) |
Check the line card. Remove and reinstall the line card. If this fails, replace the failed card. |
A line card has failed. |
Major (red) |
Replace the failed line card. |
|
A line card has been removed. |
Major (red) |
Install a line card in the empty slot. |
|
Fan trays |
A fan tray has been removed from the chassis. |
Major (red) |
Install the missing fan tray. |
One fan in the chassis is not spinning or is spinning below required speed. |
Major (red) |
Replace the fan tray. |
|
A fan is not receiving power from the fan tray controller. |
Major (red) |
Check and replace the failed fan tray controller if required. |
|
Fan Tray Controller |
A fan tray controller has failed. |
Minor (yellow) |
Check and replace the failed fan tray controller if required. |
One of the fan tray controllers in the chassis is not receiving enough power. |
Major (red) |
Check the power supply. |
|
Switch Interface Boards (SIBs) |
One of the SIBs has failed. |
Minor (yellow) |
Check the below:
|
Ethernet |
The Ethernet management interface on the RCB is down. |
Minor (yellow) |
|
Hot swapping |
Too many hot-swap interrupts are occurring. |
Major (red) |
Replace the failed components. |
Power supplies |
A power supply has been removed from the chassis. |
Minor (yellow) |
Install a power supply in the empty slot. |
A power supply has a high temperature. |
Major (red) |
Replace the failed power supply. |
|
A power supply input has failed. |
Major (red) |
Check power supply input connection and the power cord. |
|
A power supply output has failed. |
Major (red) |
Check power supply output connection. |
|
A power supply has failed. |
Major (red) |
Replace the failed power supply. |
|
AC and DC power supplies are installed. |
Major (red) |
Do not mix AC and DC power supplies. |
|
Inadequate number of power supplies. |
Major (red) |
Install an additional power supply. |
|
Current share failure | Major (red) | PSM
state remains online during current share failure. When a current
share failure occurs on devices with third-generation power
supplies, the system does not indicate the failure on the LED or
change the PSM state to Fault. Instead, the system keeps the PSM
state online and raises an alarm. No action required. |
|
mcu_access_failure |
Major (red) |
If the mcu_access_failure is displayed but does not show the state as fault, and if the PSM is delivering the output power, it suggests an environmental failure of the PSM. If you have enabled the PSM watchdog, then as a resiliency action, the PSM will be turned off. |
|
PSM I2C SCL failure | Major (red) | In
a 8-slot chassis, if the SCL (Serial Clock Line) pin of I2C shorts
to GND (Ground) pin in parent/primary PSM0 due to clock stretching
on the PSM0, it impacts transactions on all the child/secondary
PSMs. You will not be able to see the status of the PSM due to
“hwdre” failure. In such cases, isolate the faulty PSM by removing
and identifying the faulty PSM iteratively, and replace the faulty
PSM. If we interchange the PSMs and still fault remains on all PSMs
then it is possible that fault may exist in the chassis/midplane;
you may then raise an RMA for this. Example: If you are seeing fault at PSM0 and its subsequent PSMs (PSM1 to PSM3) then the fault may lie in PSM0. You need interchange the PSM0 with any other PSM from the same primary (PSM1, PSM2, or PSM3) and check whether it is rectified. If you are seeing fault at PSM4 and its subsequent child/secondary PSMs (PSM5) then the fault may lie in PSM4. You need interchange the PSM4 with PSM5 (as PSM4 is the primary PSM) and check whether it is rectified. |
|
Short pin failure | Major (red) | A
short pin failure allows the power supply to detect whether it is
properly connected to the mid-plane. When detected, the Power Supply
Module (PSM) turns on the output. Since this issue occurs external
to the PSM, it is not considered a PSM failure. Consequently, the
fault LED does not turn red. Try to re-insert and if error persists, return the PSM (RMA) as there is no midplane connectivity. |
|
Single channel pfc-failure | Major (red) | If
a PFC failure happens on a single channel, the fault LED will not
turn red and PSM will remain in online state as PSM output is still
ON. However, if all four channels fail, the fault LED will turn red
and PSM will be moved to fault state. No action required. |
|
Temperature |
The chassis temperature has exceeded 104° F (40° C), the fans have been turned on to full speed, and one or more fans have failed. |
Minor (yellow) |
|
The chassis temperature has exceeded 149° F (65° C), and the fans have been turned on to full speed. |
Minor (yellow) |
|
|
The chassis temperature has exceeded 149° F (65° C), and a fan has failed. If this condition persists for more than 90 seconds, the router will shut down. |
Major (red) |
|
|
Chassis temperature has exceeded 167° F (75° C). If this condition persists for more than 90 seconds, the router will shut down. |
Major (red) |
|
|
The temperature sensor has failed. |
Major (red) |
Open a support case using the Case Manager link at https://www.juniper.net/support/ or call 1-888-314-5822 (toll free, US & Canada) or 1-408-745-9500 (from outside the United States). |