Monitoring Temperatures of Modules

You can view the temperature of each module by issuing the show environment all and show environment table commands. In addition, the system generates detailed log messages if the temperature of a module is outside normal operating limits.

For example, if the temperature of any forwarding controller exceeds 212°F (100°C), a message appears on the console and the event is added to the system log. If you receive this message, report it to your customer service representative.

Until JunosE Release 14.3.x, if the temperature of any module exceeds the upper temperature limit, the system immediately goes into thermal protection mode. In JunosE Release 14.3.x and later, the system enters thermal protection mode if any two temperature sensors that indicate a thermal state of “too hot” for their associated modules, any one sensor is in the failure state and another sensor is in the “too hot” state, the fan tray is removed and one sensor is in the “too hot” state. After the system has entered thermal protection mode, you must resolve the cause of the high temperature quickly.

Table 13: Troubleshooting High-Temperature Conditions

Cause of High Operating Temperature

Symptoms

Resolution

Air vents to system are blocked

Space around system does not meet specifications. (See System Specifications.)

Increase space around system.

Ambient temperature exceeds specifications

Ambient temperature exceeds specifications. (See System Specifications.)

Provide extra cooling or heating in the room where the system is located.

One or more cooling fans have failed

  • FAN OK LED (FO) on SRP module is not illuminated.
  • FAN FAIL LED (FF) on SRP module is illuminated.

Replace fan tray. (See Maintaining the Router.)

A module fails

FAIL LED on module is illuminated.

Replace module. (See Installing Modules.)

When you have resolved the cause of the high temperature, the system automatically resumes operation. For example, if the system entered thermal protection mode and you replaced the fan tray, a chassis reboot is not required. The system automatically restores power to the LM and SFM modules.

Initiation of Thermal Protection Mode

When the router enters thermal protection mode, it starts to operate in a low-power, less-functional state. Thermal protection mode restricts almost all features of router operations, and it is triggered to denote a response to emergency situations. This protective capability also handles failure modes due to malfunctioning or defective hardware, such as the inability to read temperature sensors or detect fan trays. When such situations occur, system logging and SNMP messages are generated. Until JunosE Release 14.2.x, the following conditions caused thermal protection mode to be triggered:

In all of these cases, there was no substantiating evidence that the router was nearing a temperature limit. In many cases, the SNMP and log messages displayed during the countdown were not noticed or acted upon, and the system entered this mode. The severity of this response was more than the minor problems that caused this mode to occur.

To prevent thermal protection mode from being enabled in certain network scenarios that do not represent critical problems, such as the inability to read a single temperature sensor or a momentary loss of communication with fan trays, design enhancements have been made to modify the conditions that trigger thermal protection mode, beginning with JunosE Release 14.3.0.

A countdown is not used to initiate thermal protection mode for missing information, such as a failed sensor. Without this countdown, missing information alone does not initiate this mode. However, missing information continues to generate appropriate messages. With the absence of a countdown or a timer control, the entry into thermal protection mode is instantaneous because the only conditions that initiate this mode are considered valid and confirmed identifications of an excessive-temperature condition. Thermal protection mode occurs only when two independent indicators are seen to confirm a real, critical problem.

The following are a few examples of conditions that trigger thermal protection mode:

In these preceding examples, a module refers to any installed SRP module, a line module, or an IOA. For E120 and E320 routers, a fabric slice is also considered as a module to identify the condition that generates thermal protection mode.

The system logging messages are modified to include the temperature value of line modules and IOAs. Also, the message is logged at the critical or higher severity levels to enable the logs to be saved in nonvolatile storage (NVS). The failure or removal of fans generates only log messages and does not initiate thermal protection mode. Booting a system with one fan not spinning does not generate a message because it is a safe condition due to design redundancy.