Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding PFC Using DSCP at Layer 3 for Untagged Traffic

Protocols such as Remote Direct Memory Access (RDMA) over converged Ethernet version 2 (RoCEv2) require lossless behavior for traffic across Layer 3 connections to Layer 2 Ethernet subnetworks. Traditionally, priority-based flow control (PFC) can be used to prevent traffic loss when congestion occurs on Layer 2 or Layer 3 interfaces for VLAN-tagged traffic by selectively pausing traffic on any of eight priorities corresponding to IEEE 802.1p code points in the VLAN headers of incoming traffic on an interface. However, untagged traffic—traffic without VLAN tagging—cannot be examined for IEEE 802.1p code points on which to pause traffic.

To support lossless traffic flow at Layer 3 for untagged traffic, we support enabling PFC for Layer 3 interfaces and Layer 2 access interfaces using Distributed Services code point (DSCP) values in the Layer 3 IP header of incoming traffic, rather than IEEE 802.1p code point values in a Layer 2 VLAN header.

Overview of DSCP-based PFC

PFC is a data center bridging technology operating at Layer 2, and DSCP information is exchanged in IP headers at Layer 3. However, you can configure DSCP-based PFC, which preserves lossless behavior across Layer 3 network connections for untagged traffic.

PFC operates by generating pause frames for traffic identified on configured code points in incoming traffic to notify the peer to pause transmission when the link is congested. With DSCP-based PFC enabled, pause frames are triggered based on a configured 6-bit DSCP value (corresponding to decimal values 0-63) in the Layer 3 IP header of incoming traffic.

However, PFC can only send pause frames with a 3-bit PFC priority—one of 8 code points corresponding to decimal values 0-7—which, for VLAN-tagged traffic, usually corresponds to the IEEE 802.1p code points in the incoming traffic VLAN headers. Untagged traffic provides no reference for IEEE 802.1p code point values, so to trigger PFC on a DSCP value, the DSCP value must be mapped explicitly in the configuration to a PFC priority to use in the PFC pause frames sent to the peer when congestion occurs for that code point. You can map traffic on a DSCP value to a PFC priority when you define the no-loss forwarding class with which you want to classify DSCP-based PFC traffic. The forwarding class must also be mapped to an output queue with no-loss behavior.

Note:

You cannot assign the same PFC priority to more than one forwarding class because the mapped PFC priority value is used as the forwarding class ID when DSCP-based PFC is configured.

A DSCP classifier (instead of an IEEE 802.1p classifier) is also required to specify that incoming traffic with the above-configured DSCP value belongs to the no-loss forwarding class. Any DSCP values for which DSCP-based PFC is enabled on a interface must be specified in either the default DSCP classifier or in a user-defined DSCP classifier associated with the interface.

To enable DSCP-based PFC on an interface, define an input congestion notification profile with the same DSCP value (and desired buffering parameters), and associate it with the interface.

The peer device should have a matching PFC configuration for the mapped PFC priority code points.

Limitations of DSCP-based PFC

The following are limitations of DSCP-based PFC:

  • You cannot configure both DSCP-based PFC and IEEE 802.1p PFC under the same congestion notification profile, or associate both a DSCP-based congestion notification profile and an IEEE 802.1p congestion notification profile with the same interface.

  • DSCP-based PFC is supported on Layer 3 interfaces and Layer 2 access interfaces for untagged traffic only. PFC behavior is unpredictable if VLAN-tagged packets are received on an interface with DSCP-based PFC enabled.

  • Each no-loss forwarding class can only be associated with a unique 3-bit PFC priority value from 0 through 7.

Configurable PFC Accounting Thresholds

On supported platforms, there are virtual PFC pause buffers called PFC accounts that you define within a congestion notification profile (CNP). Each ingress port can have two such PFC accounts, You can independently set the PFC priority to transmit pause frames and the thresholds of XOFF and XON for each PFC account.

Consider Figure 1, which shows a typical pause buffer. In this diagram, the buffer starts to fill from the bottom up due to congestion on the egress port. When the buffer fill reaches XOFF, a PFC Pause frame is sent upstream to pause traffic associated with the PFC class. The headroom space allows for in-flight packets and processing delays so that the upstream device can pause traffic before the buffer fills completely and begins dropping packets. The system uses the cable length and the maximum receive unit (MRU) to calculate the amount of buffer headroom reserved to support PFC. The the shorter the cable length and lower the MRU, the less headroom buffer space is required for PFC.

Figure 1: Typical Pause Buffer Conceptual diagram of buffer management in flow control, showing headroom, hysteresis, XON, XOFF, and buffer filling direction.

When congestion reduces and the buffer fill falls under the XON threshold level, a resume frame is sent upstream to restart the data traffic.

For PFC to work effectively you must correctly set XOFF, XON, and the headroom buffer for each PFC account. Junos calculates the headroom space based on the defined cable length and other internally calculated factors.

You define a PFC account for input traffic in a CNP:

  1. Define one or two PFC accounts. Set a PFC priority for each account, and if necessary, set XOFF and XON for each account.

  2. Set the code-points that you are using for PFC and assign a PFC account to each code-point.

  3. Set the correct cable-length for the CNP. The cable length is the distance between the interface and its peer interfaces in meters.

Platform-Specific PFC Behavior

Use Feature Explorer to confirm platform and release support for specific features.

Use the following table to review platform-specific behaviors for your platform.

Platform Difference

PTX10000 Series

  • You can configure up to two queues as no-loss when defining forwarding classes.

  • PTX10000 Series routers support up to 100KM of cable length.

  • PTX10000 Series routers have virtual PFC Pause buffers called PFC-Accounts.

  • All PFC pause buffer accounting happens with respect to ingress ports and not egress ports.

  • If both PFC and ECN are enabled, when the occupancy of a PFC account is above XON, by default ECN-capable packets are marked as congestion experienced (CE).

Change History Table

Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.

Release
Description
17.4R1
Starting in Junos OS Release 17.4R1, to support lossless traffic flow at Layer 3 for untagged traffic, we support enabling PFC for Layer 3 interfaces and Layer 2 access interfaces using Distributed Services code point (DSCP) values in the Layer 3 IP header of incoming traffic, rather than IEEE 802.1p code point values in a Layer 2 VLAN header.