Detection and Corrective Actions of FPCs with Degraded Fabric on MX Series Routers
You can configure an FPC with degraded fabric to be moved to the offline state on an MX960, MX480, or MX240 router. Configuring this feature does not affect the system. You can configure this feature without restarting the FPC or restarting the system.
The following scenarios can occur when you configure the feature to disable FPCs with degraded fabric:
- If an FPC has degraded fabric bandwidth and if you configure this capability to turn off such an FPC after it has been operating with degraded fabric for some time, the corrective action is still taken.
- If an FPC has been brought offline because of fabric errors and this functionality to move the FPC to offline state is disabled, the FPC is transitioned to the online state automatically.
- If an FPC has been brought offline because of fabric errors and this functionality to move the FPC to offline state is disabled or configured for some other FPC, the FPC that was turned offline is transitioned to the online state automatically.
- All the FPCs that were brought offline because of degraded fabric, when you configured this setting, are brought back online when you commit any configuration under the [edit chassis] hierarchy level. Similarly, a restart of the chassis daemon or the Graceful Routing Engine switchover (GRES) operation also causes the FPC that is disabled because of degraded fabric to be moved to the online state.
Degraded fabric indicates that an FPC is operating with less than the required number of active fabric planes. If an FPC is operating with less than four planes, it is considered to be degraded. This rule applies to all types of FPCs and fabric. Degraded condition denotes that good fabric traffic exists at a reduced bandwidth.
The following conditions can result in degradation of fabric:
- The fabric control boards go offline as a result of an unintentional, abrupt power shutdown.
- An application-specific integrated circuit (ASIC) error, which causes a plane of a control board to be automatically turned offline.
- Manually bringing the fabric plane or the control board to the offline state.
- Removal of the control board
- Self-ping failure on any plane.
- HSL2 training failure for active plane.
- If a spare fabric plane has CRC errors, and this spare plane is made online, the link with the CRC error is disabled. This mechanism might cause a degradation in fabric in one direction and might cause a traffic black hole in the other direction.
- When a self-ping or HSL2 training failure occurs, the fabric plane is disabled for a particular FPC and it is online for other FPCs. This condition can also cause a traffic black hole.
If you need to remove the control board or move a fabric plane to the offline state during a system maintenance, you must enable the functionality to turn the FPCs with degraded bandwidth to the offline state (by using the offline-on-fabric-bandwidth-reduction statement at the [edit chassis fpc slot-number] hierarchy level).
The following corrective actions are performed when a traffic black hole or fabric degradation occurs:
- Regardless of whether a spare control board is available
or not, self-ping state for each FPC is monitored at intervals of
5 seconds at the Routing Engine. Fabric manager uses the following
rule to determine the presence of a spare control board:
- MX960 routers with I-chip or I-chip and Trio-chip-based FPCs that contain three control boards
- MX240 or MX480 routers with I-chip or I-chip and Trio-chip-based FPCs that contain two control boards
- MX960, MX480, or MX240 routers that contain only Trio-based FPCs are not considered to contain a spare control board
If during any such interval of 5 seconds, two FPCs indicate a failure for the same plane, a switchover to the spare control board. In this case, the control board that reported errors is turned offline and the spare control board is turned online.
- If a spare control board is available, and if you configure
the functionality to disable FPCs with degraded fabric, self-ping
state for each FPC is monitored at intervals of 5 seconds at the Routing
Engine. The following conditions can occur:
- During any 5-second interval, if only one FPC indicates a failure for a plane, the fabric Manager waits for the next interval. During the subsequent interval, if no other FPC indicates a failure for the same plane, switchover of the control board is performed.
- During any 5-second interval, if multiple FPCs show failures for multiple control boards, the fabric manager waits for the next interval. During the subsequent interval, if the same condition remains, all the failing FPCs are turned offline even if the spare control board is present.
- During any 5-second interval, if any FPC shows a failure for multiple planes on multiple control boards, the fabric manager waits for the next interval. During the subsequent interval, if the same condition persists, the FPC is turned offline even if the spare control board is present.
- If spare planes are not available, the FPC is turned offline when it displays a failure for a single plane or multiple planes. The FPC is brought offline only if you previously configured the offline-on-fabric-bandwidth-reduction statement at the [edit chassis fpc slot-number] hierarchy level.

