Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Detection of Corruption in the Statistics FPGA and System Operations on Detecting Corruption

    When a bit flip occurs in a DRAM or static RAM (SRAM) of a statistics FPGA of a router functioning as the Session and Resource Control (SRC) or RADIUS client, the router may transmit corrupted accounting statistics to a SRC or RADIUS server. This affects the computation of accounting information for subscriber sessions. You must replace the hardware to resolve the incorrect computation of accounting information caused by the corrupted statistics retrieved from the SRC or RADIUS client.

    A parity error check mechanism is introduced in the statistics FPGA of an ES2 4G LM to check whether the statistics is corrupted. The parity error check mechanism checks the parity in both the DRAM and SRAM for all statistics entries. The mechanism populates the parity error status bit of the statistics entry to indicate parity check failure. The parity check failed statistics entries are not updated after the parity error is detected by the mechanism.

    Note: The parity error check mechanism is supported only on ES2 4G LMs (with any IOA combination) and is applicable for both the PPP and L2TP subscribers. The parity error check mechanism restricts the available number of bits of a 64 bits packet/byte counter to 60 bits and 32 bits packet counter to 30 bits.

    The parity error check mechanism is triggered to check for parity error during the following events:

    • Receipt of a decision (DEC) message from the SRC server to attach the service policy to an interface.
    • Receipt of a DEC message from the SRC server to retrieve interim accounting statistics.
    • Sending of the final accounting report to the SRC server when the subscriber is terminated.
    • Execution of the show ip interface or show ipv6 interface command to display current status of a specific interface.

      Note: When the parity error check mechanism is triggered by the execution of the show ip interface or show ipv6 interface command, only the corruption is detected, but the subsequent actions such as subscriber termination are not carried out. Therefore, the subscriber slot is not added to the defective slot.

    • Receipt of accounting start request from the RADIUS server.
    • Receipt of interim accounting request from the RADIUS server.
    • Receipt of accounting stop request from the RADIUS server.

    You can use the fpga-stats-monitoring-enable command to prevent the router from reporting parity check failed user and policy accounting statistics to the RADIUS or SRC server.

    Note: The router performs actions (such as subscriber termination) only during interim update or subscriber logout, but not during subscriber login.

    Actions Performed on Detecting Parity Error

    If you have executed the fpga-stats-monitoring-enable command and parity error is detected, the router performs the following actions for user and policy accounting statistics:

    • Terminates the subscriber:
      • In a single-stack environment, terminates the corresponding IPv4 or IPv6 subscriber and blocks other subscribers (both IPv4 and IPv6) from logging on to the line module. For an AAA accounting model, sends Acct-Stop message to the RADIUS server with the older uncorrupted user and policy accounting statistics. For SRC accounting model, sends Common Open Policy Service rendezvous-point tree (COPS RPT) message to the SRC server with an error code in response to interim requests and sends COPS Delete Request (DRQ) message to the SRC server during subscriber login or logout.
      • In a dual-stack environment, terminates both IPv4 and IPv6 sessions on an interface even if statistics of any sessions (IPv4 or IPv6) on the interface is corrupted. For AAA accounting model, sends Acct-Stop message to the RADIUS server with the older uncorrupted user and policy accounting statistics of both PPP and IPv6 interfaces. For SRC accounting model, sends COPS RPT message to the SRC server with an error code in response to interim requests and sends COPS DRQ message to the SRC server during subscriber login or logout.
      • If the Tunnel Service line module (TSM) slot is affected, terminates the corrupted tunneled subscriber and changes the corresponding server port state as “draining” by setting the maximum interfaces of the TSM to zero. Also, allows new subscribers to log in if there are any uncorrupted TSM slots.

      Note: The corrupted slot information is not retained after the unified in-service software upgrade (ISSU), router reload, and line module re-insertion. However, the corrupted slot information is retained after line module reload.

    • Generates an SNMP trap indicating the user or policy accounting failure, if you have enabled the generation of SNMP trap by using the fpga-stats-monitoring trap enable command.
    • Generates a syslog message indicating the statistics corruption.
    • Periodically (every 60 seconds) monitors the Parity Error Register of the FPGA for all DRAM and SRAM banks in which the parity error is identified.
    • Supports the recovery mechanism:
      • For LCR usage models, if the parity error is detected on the active line module then the standby line module takes over as the active line module.
      • For LCHA usage models, if the parity error is detected on the active line module then the router unconfigures the LCHA group. Both active and standby line modules act as separate standalone line modules. Also maximum interfaces of the affected TSM is set as zero to prevent new subscribers from logging in.
      • If the affected line module is removed and then re-inserted, any subscribers are allowed to log in.
      • If the affected line module is reload, all subscribers are still blocked from logging in.

    If you have executed the fpga-stats-monitoring-enable command and parity error is detected, the router performs the following actions for statistics other than user and policy accounting statistics:

    • Periodically (at every 60 seconds) monitors the Parity Error Register of the FPGA for all DRAM or SRAM banks in which the parity error is identified.
    • For each DRAM or SRAM bank, stores the index of the last statistics entry whose parity error status bit is set.
    • Stops the statistics counter.
    • Allows you to retrieve corrupted statistics details through SNMP and through CLI commands.

    Note: When the parity error is detected in the multicast statistics on a corresponding interface for a multicast traffic, the unicast packet statistics does not include the actual received unicast packet count as the unicast statistics count is derived from the multicast statistics counter (that is, unicast count = inReceived packets-inMulticast packets).

    If you have executed the fpga-stats-monitoring-enable command and parity error is detected, the show ip interface and show ipv6 interface commands display an error message instead of the policy accounting statistics details for the policies whose statistics are corrupted.

    If you have not executed the fpga-stats-monitoring-enable command and parity error is detected, the subscribers are not terminated. But, the router sends the older uncorrupted user and policy accounting statistics to the RADIUS or SRC server in all subsequent interim records and also in final accounting record during subscriber logout. Also, the software displays an error message in the output of the show ppp interface command instead of the corrupted user accounting statistics details.

    Published: 2014-08-14