Selective Dynamic Load Balancing (DLB)
Overview
In AI-ML workloads, the majority of the application traffic uses Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2) for transport. Dynamic load balancing (DLB) is ideal for achieving efficient load balancing and preventing congestion in RoCEv2 networks. However, static load balancing (SLB) can be more effective for some types of traffic. With selective DLB, you no longer have to choose between DLB and SLB for all traffic traversing your device. You can configure your preferred DLB mode at the global level, configure a default type of load balancing, and then selectively enable or disable DLB for certain kinds of traffic.
You can enable load balancing in two ways: per flow or per packet. Per-flow load balancing has been the most widely used because it handles the largest number of packets at a time. The device classifies packets that have the same 5-tuple packet headers as a single flow. The device gives all packets in the flow the same load balancing treatment. Flow-based load balancing works well for general TCP and UDP traffic because the traffic utilizes all links fairly equally. However, per-packet load balancing can reorder some packets, which can impact performance.
Many AI clusters connect the application to the network through smart network interface cards (SmartNICs) that can handle out-of-order packets. To improve performance, enable per-packet DLB on your network. Then enable DLB for only those endpoint servers that are capable of handling out-of-order packets. Your device looks at the RDMA operation codes (opcodes) in the BTH+ headers of these packets in real time. Using any firewall filter match condition, you can selectively enable or disable DLB based on these opcodes. Other flows continue to use default hash-based load balancing, also known as SLB.
Selective DLB is also useful when an elephant flow encounters links that are too small for the entire data flow. In this scenario, selective DLB can calculate the optimal use of the links' available bandwidth in the data center fabric. When you enable selective per-packet DLB for the elephant flow, the algorithm directs the packets to the best-quality link first. As the link quality changes, the algorithm directs subsequent packets to the next best-quality link.
Benefits
-
Improve your network handling of large data flows.
-
Use per-packet and per-flow load balancing in the same traffic stream to improve performance.
-
Customize load balancing based on any firewall filter match condition.
Configuration
- Configuration Overview
- Topology
- Disable DLB Globally and Selectively Enable DLB
- Enable DLB Globally and Selectively Disable DLB
- Verification
Configuration Overview
You can selectively enable DLB in two ways: disable DLB by default and selectively enable DLB on certain flows, or enable DLB globally and selectively disable DLB. In either case, you'll need to first configure DLB in per-packet mode. Per-packet is the DLB mode used wherever DLB is enabled. You cannot configure DLB in per-flow and per-packet mode on the same device at the same time.
This feature is compatible with flowlet mode. You can optionally enable this feature when DLB is configured in flowlet mode.
Topology
In the topology shown in Figure 1, DLB is disabled by default. We have enabled DLB selectively on Flow2 in per-packet mode. Table 1 summarizes the load balancing configuration on the two flows shown and the results of the load balancing applied on the flows:
Flow |
DLB Enabled? |
Result |
---|---|---|
Flow1 |
No |
The device uses the default load balancing configuration, which is per-flow mode. The flow is directed to a single device. |
Flow2 |
Yes |
The device uses the DLB configuration, which is per-packet mode. The device splits this flow into packets. DLB assigns each packet to a path that is based on the RDMA opcode in the packet header and the corresponding filter. |
Disable DLB Globally and Selectively Enable DLB
In cases where very few packets will require DLB, you can disable DLB at the global level and selectively enable it per flow.
Enable DLB Globally and Selectively Disable DLB
In cases where most packets will benefit from DLB, enable DLB at the global level for all packets and selectively disable it per packet.
Verification
Verify DLB is enabled as you expected using the following commands:
show forwarding-options enhanced-hash-key
show pfe filter hw profile-info
Example: Selectively Enable DLB with a Firewall Filter Match Condition
One of the benefits of selective DLB is that you can customize load balancing based on any firewall filter match condition. This example shows how to enable DLB based on a firewall filter that matches with RDMA queue pairs. Use this example to enable per-packet DLB only for those flows terminating on a network interface card (NIC) that supports packet reordering.
In a network that uses RoCEv2 for application traffic transport, an RDMA connection sends traffic on a send queue and receives traffic on a receive queue. These queues form the RDMA connection. Together, the send queue and receive queue are referred to as a queue pair. Each queue pair has an identifiable prefix. In this example, we use queue pair prefixes to control when DLB is enabled.
This example is configured on a QFX5240-64QD switch.
Platform Support
See Feature Explorer for platform and release support.