Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Selective Dynamic Load Balancing (DLB)

Overview

In AI-ML workloads, the majority of the application traffic uses Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2) for transport. Dynamic load balancing (DLB) is ideal for achieving efficient load balancing and preventing congestion in RoCEv2 networks. However, static load balancing (SLB) can be more effective for some types of traffic. With selective DLB, you no longer have to choose between DLB and SLB for all traffic traversing your device. You can configure your preferred DLB mode at the global level, configure a default type of load balancing, and then selectively enable or disable DLB for certain kinds of traffic.

You can enable load balancing in two ways: per flow or per packet. Per-flow load balancing has been the most widely used because it handles the largest number of packets at a time. The device classifies packets that have the same 5-tuple packet headers as a single flow. The device gives all packets in the flow the same load balancing treatment. Flow-based load balancing works well for general TCP and UDP traffic because the traffic utilizes all links fairly equally. However, per-packet load balancing can reorder some packets, which can impact performance.

Many AI clusters connect the application to the network through smart network interface cards (SmartNICs) that can handle out-of-order packets. To improve performance, enable per-packet DLB on your network. Then enable DLB for only those endpoint servers that are capable of handling out-of-order packets. Your device looks at the RDMA operation codes (opcodes) in the BTH+ headers of these packets in real time. Using any firewall filter match condition, you can selectively enable or disable DLB based on these opcodes. Other flows continue to use default hash-based load balancing, also known as SLB.

Selective DLB is also useful when an elephant flow encounters links that are too small for the entire data flow. In this scenario, selective DLB can calculate the optimal use of the links' available bandwidth in the data center fabric. When you enable selective per-packet DLB for the elephant flow, the algorithm directs the packets to the best-quality link first. As the link quality changes, the algorithm directs subsequent packets to the next best-quality link.

Benefits

  • Improve your network handling of large data flows.

  • Use per-packet and per-flow load balancing in the same traffic stream to improve performance.

  • Customize load balancing based on any firewall filter match condition.

Configuration

Configuration Overview

You can selectively enable DLB in two ways: disable DLB by default and selectively enable DLB on certain flows, or enable DLB globally and selectively disable DLB. In either case, you'll need to first configure DLB in per-packet mode. Per-packet is the DLB mode used wherever DLB is enabled. You cannot configure DLB in per-flow and per-packet mode on the same device at the same time.

This feature is compatible with flowlet mode. You can optionally enable this feature when DLB is configured in flowlet mode.

Topology

In the topology shown in Figure 1, DLB is disabled by default. We have enabled DLB selectively on Flow2 in per-packet mode. Table 1 summarizes the load balancing configuration on the two flows shown and the results of the load balancing applied on the flows:

Table 1: Flow Behaviors

Flow

DLB Enabled?

Result

Flow1

No

The device uses the default load balancing configuration, which is per-flow mode. The flow is directed to a single device.

Flow2

Yes

The device uses the DLB configuration, which is per-packet mode. The device splits this flow into packets. DLB assigns each packet to a path that is based on the RDMA opcode in the packet header and the corresponding filter.

Figure 1: Per-Flow and Per-Packet Load Balancing Per-Flow and Per-Packet Load Balancing

Disable DLB Globally and Selectively Enable DLB

In cases where very few packets will require DLB, you can disable DLB at the global level and selectively enable it per flow.

  1. Enable DLB per-packet mode. Whenever DLB is enabled on a flow, DLB uses this mode to direct traffic.
  2. Disable DLB globally by turning it off for all Ethernet types. By default, all packets will get hash-based load balancing (SLB).
  3. Configure a firewall filter to match a specific RDMA opcode within the BTH+ header.

    This example matches based on rdma-opcode 10.

  4. Enable per-packet DLB within that firewall filter to only apply DLB to those packets with the chosen RDMA opcode in the BTH+ header.
  5. Other packets get the default load balancing method, which is SLB.

Enable DLB Globally and Selectively Disable DLB

In cases where most packets will benefit from DLB, enable DLB at the global level for all packets and selectively disable it per packet.

  1. Configure DLB at the global level in per-packet mode for all flows.
  2. Configure a firewall filter to match a specific RDMA opcode within the BTH+ header.

    This example matches based on rdma-opcode 10.

  3. Disable per-packet DLB within that firewall filter for packets with the chosen RDMA opcode in the BTH+ header.
  4. Other packets get the default load balancing method, which is DLB.

Verification

Verify DLB is enabled as you expected using the following commands:

Example: Selectively Enable DLB with a Firewall Filter Match Condition

One of the benefits of selective DLB is that you can customize load balancing based on any firewall filter match condition. This example shows how to enable DLB based on a firewall filter that matches with RDMA queue pairs. Use this example to enable per-packet DLB only for those flows terminating on a network interface card (NIC) that supports packet reordering.

In a network that uses RoCEv2 for application traffic transport, an RDMA connection sends traffic on a send queue and receives traffic on a receive queue. These queues form the RDMA connection. Together, the send queue and receive queue are referred to as a queue pair. Each queue pair has an identifiable prefix. In this example, we use queue pair prefixes to control when DLB is enabled.

This example is configured on a QFX5240-64QD switch.

  1. Create a user-defined field in a firewall for matching packets that is destined for a specific RDMA destination queue pair. Select a queue pair you know terminates on an NIC that is capable of reordering packets.
    We named our firewall filter sDLB. The term QP-match matches on incoming packets with a destination queue pair with the following characteristics.
  2. Configure the firewall filter to enable per-packet DLB on the queue pairs that match the filter.
    If the queue pair is not a match, the device uses the default load balancing type of SLB for that packet.
  3. Configure a counter that increments each time there is a match.
    The counter QP-match-count tracks how many packets were load balanced with DLB. You can use this information when troubleshooting.
  4. Enable your firewall filter on the relevant interface.
  5. Verify your firewall filter term is matching on packets coming through the device.
    The QP-match-count counter shows the number of bytes and packets that the firewall filter has redirected for load balancing with DLB.

Platform Support

See Feature Explorer for platform and release support.