Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Error Handling by Fabric OAM

Fabric Operation, Administration, Maintenance (OAM) helps in detecting failures in fabric paths. Fabric OAM validates the fabric connectivity before sending traffic on a fabric plane whenever a new fabric path is brought up for a PFE. If a failure is detected, the software reports the fault and avoids using that fabric plane for that PFE. This feature works by sending a very low packets per second (PPS) self-destined OAM traffic over each of the available fabric planes and detecting any loss of traffic at the end points (fabric self-ping check).

Note:
  • In Junos OS Evolved Release 20.4R1, the fabric OAM feature is enabled by default. You can disable the feature by using the CLI command set chassis fabric oam detection-disable.
  • In Junos OS Evolved Releases 20.4R2 and 21.1R1, the fabric OAM feature is disabled by default.
  • In Junos OS Evolved Release 22.1R1, the runtime fabric OAM feature is enabled by default. You can disable the feature by using the CLI command edit chassis fabric oam runtime-disable. The runtime fabric OAM feature is supported on PTX10004, PTX10008, and PTX10016 routers.

The Fabric OAM checks are done at boot time. The failed paths are disabled. The system does not do any recovery action. However, you can try to recover the affected fabric planes by restarting the SIBs. The recovery steps depend on the nature of the failure.

A fabric plane represents an independent bidirectional path between a PFE and fabric ASIC. Runtime Fabric OAM periodically checks fabric connectivity and helps detect and report failures in fabric planes during system runtime. Runtime Fabric OAM detects the fabric reachability of each PFE.

When the same fabric planes fail on a single or multiple FPCs, restart the SIB containing the failed planes, using the following commands:

user@host> request chassis sib slot slot-number offline

user@host> request chassis sib slot slot-number online

When random fabric planes fail on multiple FPCs, the fault cannot be isolated to a specific FPC or SIB. However, you can try to recover the planes by restarting the SIBs that contain the affected planes in a sequential manner.

For each error detected by the fabric OAM feature, a syslog is generated. The following is an example:

The following syslog message indicates that a fabric OAM-related error was cleared.

Also, you can use the CLI commands show system errors active detail and show system alarms to view the Fabric OAM-related errors.

The following output shows details for both single fabric plane failure (on Packet Forwarding Engine 0) and all fabric planes failure (on Packet Forwarding Engine 1).

You can use the CLI command show chassis fabric fpcs to view the fabric OAM self-ping state of each fabric plane.

The show chassis fabric fpcs command displays the following output when the fabric OAM feature is disabled: