As the number of devices managed by a typical network management system (NMS) grows and the complexity of the devices themselves increases, it becomes increasingly impractical for the NMS to use polling to monitor the devices. A more scalable approach is to rely on network devices to notify the NMS when something requires attention.
On Juniper Networks routers, RMON alarms and events provide much of the infrastructure needed to reduce the polling overhead from the NMS. (For more information, see Configuring RMON Alarms and Events.) However, with this approach, you must set up the NMS to configure specific MIB objects into RMON alarms. This often requires device-specific expertise and customizing of the monitoring application. In addition, some MIB object instances that need monitoring are set only at initialization or change at runtime and cannot be configured in advance.
To address these issues, the health monitor extends the RMON alarm infrastructure to provide predefined monitoring for a selected set of object instances (for file system usage, CPU usage, and memory usage) and includes support for unknown or dynamic object instances (such as JUNOS Software processes).
Health monitoring is designed to minimize user configuration requirements. To configure health monitoring entries, you include statements at the [edit snmp] hierarchy level of the configuration:
- [edit snmp]
- health-monitor {
- falling-threshold percentage;
- interval seconds;
- rising-threshold percentage;
- }
You can use the show snmp health-monitor operational command to view information about health monitor alarms and logs.
This topic describes the minimum required configuration and discusses the following tasks for configuring the health monitor:
When you configure the health monitor, monitoring information for certain object instances is available, as shown in Table 21.
Table 21: Monitored Object Instances
To enable health monitoring on the router or switch, include the health-monitor statement at the [edit snmp] hierarchy level:
- [edit snmp]
- health-monitor;
The falling threshold is the lower threshold (expressed as a percentage of the maximum possible value) for the monitored variable. When the current sampled value is less than or equal to this threshold, and the value at the last sampling interval is greater than this threshold, a single event is generated. A single event is also generated if the first sample after this entry becomes valid is less than or equal to this threshold. After a falling event is generated, another falling event cannot be generated until the sampled value rises above this threshold and reaches the rising threshold. You must specify the falling threshold as a percentage of the maximum possible value. The default is 70 percent.
By default, the rising threshold is 80 percent of the maximum possible value for the monitored object instance. The rising threshold is the upper threshold for the monitored variable. When the current sampled value is greater than or equal to this threshold, and the value at the last sampling interval is less than this threshold, a single event is generated. A single event is also generated if the first sample after this entry becomes valid is greater than or equal to this threshold. After a rising event is generated, another rising event cannot be generated until the sampled value falls below this threshold and reaches the falling threshold. You must specify the rising threshold as a percentage of the maximum possible value for the monitored variable.
To configure the falling threshold or rising threshold, include the falling-threshold or rising-threshold statement at the [edit snmp health-monitor] hierarchy level:
- [edit snmp health-monitor]
- falling-threshold percentage;
- rising-threshold percentage;
percentage can be a value from 1 through 100.
The falling and rising thresholds apply to all object instances monitored by the health monitor.
The interval represents the period of time, in seconds, over which the object instance is sampled and compared with the rising and falling thresholds.
To configure the interval, include the interval statement and specify the number of seconds at the [edit snmp health-monitor] hierarchy level:
- [edit snmp health-monitor]
- interval seconds;
seconds can be a value from 1 through 2147483647. The default is 300 seconds (5 minutes).
The system log entries generated for any health monitor events (thresholds crossed, errors, and so on) have a corresponding HEALTHMONITOR tag rather than a generic SNMPD_RMON_EVENTLOG tag. However, the health monitor sends generic RMON risingThreshold and fallingThreshold traps.