Fabric Hardening and Recovery on PTX10K Devices
-
PTX10001-36MR, PTX10004, PTX10008, and PTX100016 routers with PTX10K-LC1202-36MR Line Card
-
PTX10008 router with PTX10K-LC1301-36DD line card.
Fabric hardening is a resiliency feature to detect fabric blackholing and attempt automatic recovery process to restore the Packet Forwarding Engines from blackhole condition.
We’ve enabled fabric hardening by default. When the system detects any unreachable Packet Forwarding Engine destination, this feature attempts automatic fabric connectivity restoration.
If restoration fails, the system turns off the interfaces to limit the blackholing and
trigger alarm to indicate the unreachable Packet Forwarding Engine destinations. However,
instead of turning off the interfaces, user can configure Packet Forwarding Engine offline by
using set chassis fabric event reachability-fault actions recovery-failure
pfe-offline
statement at the [set chassis fabric event]
hierarchy
level.
Packet Forwarding Engine destinations can become unreachable for the following reasons:
-
Complete self-blackhole- Complete connectivity loss occurs on all fabric planes.
-
Complete peer-blackhole- Two Packet Forwarding Engines can reach the fabric but not each other.
You can configure a router to trigger fabric recovery when the router detects degradation in
fabric bandwidth by using degraded
statement at the [edit chassis
fabric event reachability-fault]
hierarchy level. The degradation statement is
configured with a percentage value that can range from 1 to 99. The percentage value
represents the error threshold for fabric bandwidth degradation and the router starts the
recovery once the threshold is reached.
When the degraded error threshold is configured, the router can also attempt fabric recovery for the following reasons:
-
Self degrdation- Degraded fabric condition in a Packet Forwarding Engine destination.
-
Peer degradation- Degraded fabric condition between two Packet Forwarding Engines.
The fabric recovery process involves one or more of the following phases:
-
SIB restart phase: If Packet Forwarding Engine destinations across multiple line cards have fabric connectivity failures on planes, then the router attempts to resolve the issue by restarting the SIBs. If multiple SIBs require a restart, the router restarts the SIBs one by one.
-
FPC restart phase: The router attempts automatic recovery by restarting the FPCs for the following scenarios:
-
All Packet Forwarding Engine destinations having complete or partial blackhole conditions are in a single FPC.
-
If Packet Forwarding Engine destinations with complete or partial blackhole conditions occur across different FPCs, but none of the Packet Forwarding Engines share common plane of failure.
-
The attempt of SIB restart phase failed to recover Packet Forwarding Engines.
You can disable restarting of FPCs to limit recovery actions from a degraded fabric condition. To disable restarting of FPCs, use the
set chassis fabric event reachability-fault actions fpc-restart-disable
statement at the[set chassis fabric event]
hierarchy level. -
-
Packet Forwarding Engine offline phase: Because previous attempts of recovery phases failed or recovery action disabled in the configuration, the router turns off the interfaces to limit the blackholing by default. However, instead of turning off the interfaces, user can configure Packet Forwarding Engine offline by using
set chassis fabric event reachability-fault actions recovery-failure pfe-offline
statement at the[set chassis fabric event]
hierarchy level.
If the router has only Packet Forwarding Engines with peer blackhole or peer degradation condition, then the router attempts recovery through link autoheal by restarting fabric links on the planes.
Benefits
-
Attempts automatic recovery process to recover the Packet Forwarding Engines from degraded fabric conditions to minimize traffic loss.
-
Raise alarms that provide fault information to indicate the unreachable Packet Forwarding Engine destinations, if the recovery fails.