health-monitor (KHMS)

Syntax

Hierarchy Level

Description

Configure the kernel health monitoring system (KHMS). The KHMS is used to detect and take action on stuck ifstate clients. Stuck ifstate clients can affect system performance. With this configuration statement you can configure a time interval in which the system can detect a stuck ifstate client. You also can configure the action the system takes when it finds a stuck ifstate client.

The ifstate clients receive states from the kernel. There are two kinds of ifstate clients:

non-peer clients (for example, some daemons, or processes, on the Routing Engine)—ifstate non-peer clients open connections between programs and read states from or write states to the kernel.
peer clients (for example, FPCs)—ifstate peer clients read peer messages and send updates to the peers.

An ifstate client is stuck if the kernel sends a message and the ifstate client does not send back an ACK. A rt_pfe_veto condition is a log message that indicates that states are sent but no ACK comes back. However, the system won’t take the configured action until the configured time interval times outs, in case the ACK comes late.

Options

ifstate-clients

Configure which ifstate clients you want to monitor and manage. There are three options:

peer-stuck—Monitor and manage stuck peers.
non-peer-stuck—Monitor and manage stuck processes.
all-clients-stuck—Monitor and manage both stuck peers and stuck processes.

threshold-level

Configure the time interval in which to detect if a given ifstate client is stuck:

high—540 seconds
medium—360 seconds; this is the default.
low—180 seconds

action

Configure the action to be taken on the stuck ifstate client once the configured time interval times out.

alarm—Only an alarm will be raised about the stuck ifstate client; this is the default.
alarm-with-cores—An alarm will be raised about the stuck ifstate client after collecting live cores from the primary Routing Engine kernel and the stuck peer.

CAUTION:
In the case of a stuck peer, collecting lives cores might result in the component being restarted or rebooted.
restart—The stuck ifstate client will be disconnected after collecting live cores from the primary Routing Engine kernel and the stuck peer (depending on supportability).

CAUTION:
When choosing this action, be aware of the implications of restarting an ifstate client. For example, some FPCs don’t simply restart; they reboot.

Required Privilege Level

admin

Release Information

Statement introduced in Junos OS Release 16.1R1.

ON THIS PAGE