Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Detection and Recovery of Fabric-Related Failures Caused by Loss of Connectivity on MX Series Routers

Connectivity loss in a router occurs when the router is unable to transmit data packets to other neighboring routers, although the interfaces on that router continue to be in the active state. As a result, the other neighboring routers continue to forward traffic to the impacted router, which drops the arriving packets without sending a notification to the other routers.

When a Packet Forwarding Engine in a router is unable to send traffic to other Packet Forwarding Engines over the data plane within the same router, the router is unable to transmit any packets to a neighboring router, although the interfaces are advertised as active on the control plane. Fabric failure can be one of the reasons for the loss of connectivity.

The following fabric failure scenarios can occur:

  • Removal of the control board

  • High-speed link 2 (HSL2) training failures

  • Single link failure on a line card

  • Multiple link failures on the same line card or the same fabric plane

  • Multiple link failures randomly on a line card or a fabric plane

  • Intermittent cyclic redundancy check (CRC) errors

  • A complete loss of connectivity for only one destination and not to other destinations

When a line card does not forward traffic due to a certain reason to other line cards within the device, the control protocol on the Routing Engine is unable to detect this condition. The traffic transmission is not diverted to the functional, active line cards and, instead, the packets are continued to be sent to the affected line card and are dropped at that point. The following might be the causes for a line card being unable to forward traffic:

  • All the planes in the system are in the Offline or Fault state.

  • All the Packet Forwarding Engines on the line card might have disabled the fabric streams due to destination errors.

If all the Switch Control Boards (SCBs) lose connectivity to the line cards, then all the interfaces are brought down. If a Packet Forwarding Engine of a line card loses complete connectivity to or from the fabric, then that line card is brought down.

System hardware failures can be of the following types:

  • A single occurrence or a rare failure for a brief period (such as environmental spikes). This failure is effectively healed without manual intervention by restarting the fabric plane and restarting the line cards and the fabric plane, if necessary.

  • Repeated failures that occur frequently.

  • A permanent failure.

A recovery from any case of reduced throughput, such as multiple Packet Forwarding Engine destination timeouts on multiple planes is not attempted. Restoration of connectivity is attempted only when all the planes are in the Offline or Fault state or when the destinations are unreachable on all active planes.

If connectivity loss occurs because of a certain line card, which is either a common source or common destination of the destination timeout, and if you have configured the action-fpc-restart-disable statement at the [edit chassis fabric degraded] hierarchy level, no recovery action is taken. The show chassis fabric reachability command output can be used to verify the status of the fabric and the line card. An alarm is triggered to indicate that the particular line card is causing the connectivity loss.