Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding Fabric Fault Handling on PTX5000 Packet Transport Router

Starting with Junos OS Release 14.1, the PTX5000 Packet Transport Router supports nine Switch Interface Boards (SIBs). Each FPC2-PTX-P1A FPC supports 1Tb per slot capacity, thereby resulting in a fabric bandwidth of 16 terabits per second (Tbps), full-duplex (8 Tbps of any-to-any, nonblocking, half-duplex) switching.

The fabric fault management functionality involves monitoring all high-speed links connected to the fabric and the ones within the fabric core for link failures and link errors.

The faults that occur in a PTX5000 can be broadly categorized into:

  • Board faults—Faults that arise in a SIB or in an Flexible Port Concentrator (FPC) during initialization or during runtime, including issues that arise when a router component is accessing the SIB or FPC or issues that arise out of midplane failures.

  • Link faults—Faults that occur on high-level links in a router during initialization or during runtime.

  • Faults due to environmental conditions—Faults that occur because of overvoltage or over-temperature; faults that occur because of an operator mishandling a SIB or an FPC, and so on.

The router takes action on the basis of the fault category and the fault location. The actions include:

  • Reporting link errors in system log files and sending this information to the Routing Engine.

  • Displaying the link errors when you run one of the operational commands listed in Table 1:

    Table 1: List of Operational Mode Commands

    Operational mode command

    Description

    show chassis sibs

    Displays Switch Interface Boards (SIBs) status information.

    show chassis fabric fpcs <slot number>

    Displays the fabric state of the specified FPC slot. If no slot number is provided, it displays the status of all FPCs.

    show chassis fabric sibs <slot number>

    Displays the state of the electrical switch fabric link between the SIBs and the FPCs.

    show chassis fabric reachability <detail>

    Displays the current state of fabric destination reachability.

    show chassis fabric unreachable-destinations

    Displays the list of destinations that have transitioned from a reachable state to an unreachable state.

    show pfe statistics error

    Displays Packet Forwarding Engine error statistics.

    show chassis fabric topology <sib_slot>

    Displays the input-output link topology.

    show chassis fabric summary

    Displays the state of all fabric planes and the elapsed uptime.

  • Reporting link failures at the FPC level or at the SIB level and sending this information to the Routing Engine.

  • Reporting link error information in the show chassis alarms operational command.

  • Moving a SIB into fault state.

The following sections explain fabric fault handling functionality on the PTX5000:

SIB-Level Faults

The following sections give a brief overview on the types of faults that occur on a SIB and how to handle them:

Types of Faults That Occur on a SIB

Board faults and link faults occur on a SIB during initialization and during runtime. Some faults occur because of environmental conditions such as overvoltage or over-temperature, or when an operator mishandles the SIB.

Note:

Run the operational mode commands listed in Table 1 to detect faults.

During SIB initialization and runtime, the following faults might occur:

  • Board faults, such as failure of SIBs to power up, ASICs reset failure, Switch Processor Mezzanine Board (SPMB) polled I/O access failure to ASICs, board component failures such as PIC failures, or router component access failures.

  • Link faults such as high-level link errors that occur during link training.

  • Faults that occur because of environmental conditions or because of mishandling of the SIB by the operator.

Handling SIB-Level Faults

The following list illustrates how the router handles a fault that occurs on a SIB during initialization, during runtime, because of environmental conditions, and because of mishandling of the SIB by the operator:

  • To handle a board fault on a SIB during initialization, the chassis daemon (chassisd) marks the SIB to be in fault state. After the SIB is marked as faulty, no operation occurs on this SIB.

  • To handle a board fault on a SIB during runtime, chassisd logs an error in the system log file, raises an alarm indication error type, and marks the SIB as faulty. After the SIB is marked as faulty, no operation occurs on this SIB.

  • To handle a link fault on a SIB during runtime, when a link error comes up during link training, chassisd informs the FPC corresponding to the link on which the error occurred to disable the links to the affected SIB. The chassisd then sends an error message to all the other FPCs in the router to stop using the failed SIB link and a link error alarm is generated. Note that when more than one FPC report errors for a given SIB, the SIB is disabled for all FPCs and no traffic is sent by the Packet Forwarding Engine through the affected SIB.

  • To handle a link fault on a SIB during runtime, chassisd marks the SIB as faulty and specifies a reason for the error, and the SIB is disabled.

  • In case of an environmental fault—overvoltage or over-temperature—the SIB is immediately taken offline. Note that an error is logged periodically as the temperature or voltage rises, and the SIB is taken offline when it crosses a certain threshold voltage or temperature.

  • When a SIB is abruptly removed or dislodged, all the affected Packet Forwarding Engines stop using that plane to reach other Packet Forwarding Engines in the router.

FPC-Level Faults

The following sections give a brief overview of the types of faults that occur on an FPC and how to handle them:

Types of Faults That Occur on an FPC

Board faults and link faults occur on an FPC during initialization and during runtime. Some faults also occur because of environmental conditions such as overvoltage, over-temperature, or when the operator mishandles the FPC.

Note:

Run the operational commands listed in Table 1 to detect faults.

During FPC initialization and runtime, the following faults might occur:

  • Board faults such as failure of FPCs to power up, failure of ASICs to come out of reset phase, PMB polled I/O access failure to ASICs, board component failures such as PIC failure, or router component access failures.

  • Link faults such as high-level link errors that occur during link training.

  • Faults that occur because of environmental conditions or because of mishandling of an FPC by the operator.

Handling FPC-Level Faults

The following list illustrates how the router handles a fault that occurs on an FPC during initialization, during runtime, because of environmental conditions, and because of mishandling of the FPC by the operator:

  • To handle a board fault on an FPC during initialization, chassisd marks the FPC to be in fault state. After the SIB is marked as faulty, no operation occurs on this FPC.

  • To handle a board fault on an FPC during runtime, chassisd logs an error in the system log file, raises an alarm indication error type, and marks the FPC as faulty. After the FPC is marked as faulty, no operation occurs on this FPC.

  • To handle onboard link errors on an FPC during initialization or during runtime, the FPC is taken down and all the affected Packet Forwarding Engines stop using that plane to reach other Packet Forwarding Engines in the router.

    Note:

    No planes are taken down during initialization because the link training process for the fabric is not yet complete.

    Onboard link errors during runtime are resolved on the basis of current configuration; either the FPC is rebooted or the error is logged and the FPC continues with initialization.

  • In case of an environmental fault—over voltage or over-temperature—the FPC is immediately taken offline. Note that an error is logged periodically as the temperature or voltage rises, and the FPC is taken offline when it crosses a certain threshold voltage or temperature.

  • When an FPC is abruptly removed or dislodged, all the other Packet Forwarding Engines stop sending traffic to the Packet Forwarding Engines in this FPC.

Change History Table

Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.

Release
Description
14.1
Starting with Junos OS Release 14.1, the PTX5000 Packet Transport Router supports nine Switch Interface Boards (SIBs).