Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

DSCP-based PFC for Layer 3 Untagged Traffic

You can configure DSCP-based PFC to support lossless behavior for untagged traffic across Layer 3 connections to Layer 2 subnetworks for protocols such as Remote Direct Memory Access (RDMA) over converged Ethernet version 2 (RoCEv2).

Overview

With DSCP-based PFC, pause frames are generated to notify the peer that the link is congested based on a configured 6-bit Distributed Services code point (DSCP) value in the Layer 3 IP header of incoming traffic, rather than a 3-bit IEEE 802.1p code point in the Layer 2 VLAN header.

Because PFC can only send pause frames corresponding to PFC priority code points, the 6-bit configured DSCP value must be mapped to a 3-bit PFC priority to use in pause frames when DSCP-based PFC is triggered. Configuring the mapping involves mapping the PFC priority value to a no-loss forwarding class when you map the forwarding class to a queue, defining a congestion notification profile to enable PFC on traffic with the desired DSCP value, and configuring a DSCP classifier to associate the PFC priority-mapped forwarding class (along with the loss priority) with the configured DSCP value on which to trigger PFC pause frames.

The peer device should have output PFC and a corresponding flow control queue configured to match the PFC priority configuration on the device.

Use Feature Explorer to confirm platform and release support for specific features.

DSCP-based PFC for Layer 3 Untagged Traffic in AI-ML Data Centers

AI and ML applications are rapidly expanding in data centers. When dealing with AI and ML workloads and large data sets, one critical challenge is handling the size of the data. Offloading the computation to graphics processing units (GPUs) can significantly speed up this task. However, the data size and the model, especially with large language models (LLMs), often exceed the memory capacity of a single GPU. As a result, you commonly require multiple GPUs to achieve reasonable job completion times, especially for training.

The performance of an AI data center depends on the number of GPUs that are used and the efficiency of the network that connects them. Slowdowns in the network can lead to underutilization of GPUs and longer job completion times. Ethernet-based networks are becoming more popular as an alternative to InfiniBand for AI data center networking. One solution is the Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2) network.

RoCEv2 involves encapsulating RDMA protocol packets within UDP packets for transport over Ethernet networks. The RoCEv2 protocol utilizes priority-based flow control (PFC) to establish a drop-free network, while data center quantized congestion notification (DCQCN) provides end-to-end congestion control for RoCEv2. Junos OS Evolved supports DCQCN by combining explicit congestion notification (ECN) and PFC to enable end-to-end lossless AI Ethernet networking.

To support lossless IPv6 traffic across Layer 3 (L3) connections to Layer 2 (L2) subnetworks, you can configure PFC to operate using 6-bit Differentiated Services code point (DSCP) values from L3 headers of untagged VLAN traffic. You can use PFC with DSCP as an alternative to IEEE 802.1p priority values in L2 VLAN-tagged packet headers. You need DSCP-based PFC to support RoCEv2.

Benefits
  • Utilize Ethernet-based networks for AI-ML data center networking.

  • Improve network efficiency for large data sets.

  • Enable end-to-end lossless AI-ML Ethernet networking.

Configuration

To configure DSCP-based PFC:

  1. Map a lossless forwarding class to a PFC priority—a 3-bit value represented in decimal form (0-7)—to use in the PFC pause frames.

    You must also assign an output queue to the forwarding class with the queue-num option. The no-loss option is required in this case to support lossless behavior for DSCP-based PFC, and the pfc-priority statement specifies the priority value mapping, as follows:

  2. Define an input congestion notification profile to enable PFC on traffic specified by the desired 6-bit DSCP value. Optionally configure the maximum receive unit (MRU) and cable length (used to determine PFC buffer headroom space reserved for the link):

    Note:

    You cannot configure both DSCP-based PFC and IEEE 802.1p PFC under the same congestion notification profile.

  3. Set up a DSCP classifier for the configured DSCP value and no-loss forwarding class mapped in the previous steps:

  4. Assign the classifier and congestion notification profile set up in the previous steps to an interface on which you are enabling DSCP-based PFC:

  5. Review your configuration.

    For example, with the following sample commands configuring DSCP-based PFC for interface xe-0/0/1, PFC pause frames will be generated with PFC priority 3 when incoming traffic with DSCP value 110000 becomes congested:

Configuration for PTX10000 Series Routers

  1. PTX10000 Series routers have separate buffer spaces for lossy and lossless queues, with 10percent of the total buffer spaces reserved for lossless queues by default. If necessary, adjust the amount of buffer space reserved for lossless queus.
    You adjust the percent of buffer space reserved for lossless queues on a per-FPC basis:
  2. Map a lossless forwarding class to a PFC priority—a 3-bit value represented in decimal form (0-7)—to use in the PFC pause frames.

    You must also assign an output queue to the forwarding class with the queue-num option. The no-loss option is required in this case to support lossless behavior for DSCP-based PFC, and the pfc-priority statement specifies the priority value mapping, as follows:

  3. Define an input congestion notification profile to enable PFC on traffic specified by the desired 6-bit DSCP value. Optionally configure the maximum receive unit (MRU) and cable length (used to determine PFC buffer headroom space reserved for the link):
    Note:

    You cannot configure both DSCP-based PFC and IEEE 802.1p PFC under the same congestion notification profile.

    Include the PFC account(s) and assign a PFC account to each code point.

  4. Set up a DSCP classifier for the configured DSCP value and no-loss forwarding class mapped in the previous steps:
  5. Assign the classifier and congestion notification profile set up in the previous steps to an interface on which you are enabling DSCP-based PFC:
  6. Review your configuration.

    For example, with the following sample commands configuring DSCP-based PFC for interface xe-0/0/1, PFC pause frames will be generated with PFC priority 3 when incoming traffic with DSCP value 110000 reaches a delay equal to XOFF, which is set to 5000 microseconds, and a resume frame is sent with the delay falls back below XON, which is set to 2500 microseconds:

Verify the configuration.

  1. Check the ingress port.

  2. Display the DSCP-based input congestion notification profile.

  3. Display which forwarding classes are mapped to each PFC priority.