Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

PFC Watchdog

PFC Watchdog Overview

Priority-based flow control (PFC) pause frames are used in lossless Ethernet to pause the link partner from sending packets. These PFC pause frames can propagate through the whole network and can cause the traffic on the PFC streams to halt. Use the PFC watchdog to detect and resolve PFC pause storms.

The PFC watchdog monitors PFC-enabled ports for PFC pause storms. When a PFC-enabled port receives PFC pause frames for an extended period of time and PFC watchdog does not detect flow control frames on that port, PFC watchdog mitigates the situation. It does this by disabling the queue where the PFC pause storm was detected for a configurable length of time called the recovery time. After the recovery time passes, PFC watchdog re-enables the affected queue.

Understanding PFC Watchdog

The PFC watchdog has three functions: detection, mitigation, and restoration.

The PFC watchdog checks the status of PFC queues at regular intervals called polling intervals. If the PFC watchdog finds a PFC queue with a non-zero pause timer, it compares the queue's current transmit counter register to the last recorded value. If the PFC queue has not transmitted any packets since the last polling interval, the PFC watchdog checks if there are any packets in the queue. If there are packets on the queue that are not being transmitted and there are no flow control frames on that port, the PFC watchdog detects a stall condition.

After the PFC watchdog detects a stall condition, it disables the queue where it detected the PFC pause storm for a period of time called the recovery time. During that time, it flushes all packets in the queue and prevents new packets from being added to the queue. The system monitors all packet drops on the PFC queue during the recovery time.

When the recovery time ends, the PFC watchdog collects the ingress drop counters and any other drop counters associated with disabling the PFC queue. The PFC watchdog maintains a count of the packets lost during the last recovery and the total number of lost packets due to PFC mitigation since the device was started. The PFC watchdog then restores the queue and re-enables PFC.

How to Configure PFC Watchdog

Enable PFC Watchdog

PFC watchdog only works for PFC queues. To designate a queue as a PFC queue, use the flow-control-queue statement with the queue number:

Enable PFC watchdog using the pfc-watchdog statement at the [edit class-of-service congestion-notification-profile profile-name] hierarchy level:

Enabling PFC watchdog on the congestion notification profile without configuring other options enables the PFC watchdog with the default values. By default, the polling interval is 100 ms, the detection period is set to 2 (that is, two polling intervals, or 200 ms), and the recovery time is 200 ms. To learn how to configure non-default values, read the following sections.

Detection

The PFC watchdog monitors the PFC-enabled queues periodically for continuous PFC pause assertion by the downstream device when the queue is empty. If this occurs, PFC watchdog detects a stall condition. The system must detect this stall condition within a specified amount of time. This length of time is determined by how you configure two statements: poll-interval and detection.

The PFC watchdog checks the status of PFC queues at regular intervals. Configure this interval in milliseconds using the poll-interval statement. The PFC watchdog checks the status of the queues once per polling interval. The default interval is 100 ms. The minimum interval is 100 ms and the maximum is 1000 ms.

The PFC watchdog must detect stall conditions for at least two consecutive polling intervals before it determines that a PFC queue has stalled. Configure the detection statement to control how many polling intervals the PFC watchdog waits before it mitigates the stalled traffic. The default is two polling intervals. The maximum number is 10 polling intervals.

The total detection time is the length of the polling interval multiplied by the number of polling intervals.

Mitigation

When the PFC watchdog detects that a PFC queue has stalled, it moves the queue to the mitigation state. Configure the pfc-watchdog-action statement to specify the action that the PFC watchdog takes to mitigate the traffic congestion. The only option is the drop action. When the PFC watchdog detects that a PFC queue has stalled, it drops all queued packets and all newly arriving packets for the stalled PFC queue.

Restoration

Use the recovery statement to configure how long the PFC watchdog disables the affected queue for before it restores PFC. The minimum recovery period is 200 ms and the maximum is 10,000 ms.

After the recovery time passes, the PFC watchdog re-enables PFC on the affected queues.

Verification

Use the following command to verify you have configured the PFC watchdog correctly:

The detection time shown is the polling interval multiplied by the detection period. In this case, the polling interval is 100 ms, so the configured detection time was two.

Monitoring PFC Watchdog

You can view the number of PFC pause storms that have been detected and recovered, as well as the number of packets that have been dropped, on the PFC queues on an interface. Use the following command to view the PFC watchdog statistics on a particular interface.

You can view the actions that PFC watchdog takes in the system log.

  • When the PFC watchdog is enabled on a new port, the system log displays this message: CDA PfcWd: PFC Watchdog detection enabled on ifd: et-0/0/16 Poll Interval:100ms Detection Period:200ms Recovery Interval:200ms
  • When the PFC watchdog detects a stall condition, the system log displays this message: CDA PfcWd: PFC Storm Detected! on ifd:et-0/0/16 Queue: 3 Priority: 3 BLOCKED for AutoRecovery Recovery Time: 200ms
  • When the queue recovers from the PFC pause storm, the system log displays this message: CDA PfcWd: PFC Storm Recovered on Port ifd:et-0/0/16 Queue: 3 Priority: 3 UNBLOCKED after AutoRecovery Recovery Time: 200ms