Reactive Path Rebalancing
Overview
Dynamic load balancing (DLB) is an important tool for handling the large data flows (also known as elephant flows) inherent in AI-ML data center fabrics. Reactive path rebalancing is an enhancement to existing DLB features.
In the flowlet mode of DLB, you (the network administrator) configure an inactivity interval. The traffic uses the assigned outgoing (egress) interface until the flow pauses for longer than the inactivity timer. If the outgoing link quality deteriorates gradually, the pause within the flow might not exceed the configured inactivity timer. In this case, classic flowlet mode does not reassign the traffic to a different link, so the traffic cannot utilize a better-quality link. Reactive path rebalancing addresses this limitation by enabling the user to move the traffic to a better-quality link even when flowlet mode is enabled.
The device assigns a quality band to each equal-cost multipath (ECMP) egress member link that is based on the traffic flowing through the link. The quality band depends on the port load and the queue buffer. The port load is the number of egress bytes transmitted. The queue buffer is the number of bytes waiting to be transmitted from the egress port. You can customize these attributes based on the traffic pattern flowing through the ECMP.
Benefits
-
Scalable solution to link degradation
-
Optimal use of bandwidth for large data flows
-
Avoidance of load balancing inefficiencies due to long-lived flows
Platform Support
See Feature Explorer for platform and release support. Starting in Junos OS Evolved Release 23.4R2, this feature is supported on these platforms:
-
QFX5240-64OD
-
QFX5240-64QD
Topology
In this topology, the device has three ingress ports and two egress ports. Two of the ingress streams are Layer 2 (L2) traffic and one is Layer 3 (L3) traffic. The figure shows the table entries forwarding the traffic to each of the egress ports. All the ingress and egress ports are of the same speed.
In this topology, reactive path rebalancing works as follows:
Quality of delta 2 is configured.
L2 stream 1 (
mac 0x123
) enters ingress port et-0/0/0 with a rate of 10 percent. It exits through et-0/0/10. The egress link utilization of et-0/0/10 is 10 percent and the quality band value is 6.The L3 stream enters port et-0/0/1 with a rate of 50 percent. It exits through et-0/0/11 and selects the optimal link from the ECMP member list. The egress link utilization of et-0/0/11 is 50 percent with a quality band value of 5.
L2 stream 2 (
mac 0x223
) enters port et-0/0/2 with a rate of 40 percent. It also exits through et-0/0/11. This further degrades the et-0/0/11 link quality band value to 4. Now the difference in the quality band values of both ECMP member links is 2.The reactive path balancing algorithm now becomes operational because the difference in quality band values for ports et-0/0/10 and et-0/0/11 is equal to or higher than the configured delta of 2. The algorithm moves the L3 stream from et-0/0/11 to a better-quality member link, which in this case is et-0/0/10.
After the L3 steam moves to et-0/0/10, the et-0/0/10 link utilization increases to 60 percent with a decrease in quality band value to 5. L2 stream 2 continues to exit through et-0/0/11. The et-0/0/11 link utilization remains at 40 percent with an increase in quality band value to 5.
Configuration
set forwarding-options enhanced-hash-key ecmp-dlb flowlet reassignment quality-delta reassign-quality-delta set forwarding-options enhanced-hash-key ecmp-dlb flowlet reassignment prob-threshold reassign-prob-threshold
Implementation Notes
Configure DLB in flowlet mode before enabling reactive path rebalancing.
Quality bands are numbered from 0 through 7, where 0 is the lowest quality and 7 is the highest quality. Based on the member port load and queue size, DLB assigns a quality band value to the member port. The port-to-quality band mapping changes based on instantaneous port load and queue size.
When both of the following conditions are met, reactive path rebalancing reassigns a flow to a higher-quality member link:
A better-quality member link is available whose quality is equal to or greater than the current member's quality plus the configured reassignment quality delta value.
Configure the
quality-delta
option to set the difference in quality between the current stream member and the member available for reassignment. The range is 0 through 8. Set it to 0 to disable reassignment of the flows.The packet random value that the system generates is lower than the reassignment probability threshold value.
Configure the
prob-threshold
option to set the probability threshold that reactive path rebalancing uses to reassign the existing flow to a better available member.The range is 0 through 255. Set it to 0 to disable reassignment of the flows.
When the
quality-delta
option is configured, the probability threshold defaults to 100.When you configure a lower probability threshold value, flows move to a higher-quality member link at a slower rate. For example, flows move to a higher-quality link more quickly with a probability threshold value of 200 than with a probability threshold value of 50.
Be aware of the following when using this feature:
-
Reactive path rebalancing is a global configuration and applies to all ECMP DLB configurations in the system.
-
Optimal selection of quality delta is very important. An incorrect delta can result in continuous reassignment of flow from one link to another.
-
You can configure egress quantization in addition to reactive path rebalancing to control the flow reassignment.
-
Packet reordering can occur when the flow moves from one port to another. Configuring reactive path rebalancing can cause momentary out-of-order issues when the flow is reassigned to the new link.
Verification and Troubleshooting
show forwarding-options enhanced-hash-key