Global Load Balancing for AI Fabrics
Introduction
Load balancing ensures that network traffic is distributed as evenly as possible across all members in an Equal-cost multi-path (ECMP) or LAG group. Generally, load balancing is categorized into either static or dynamic.
Static Load Balancing (SLB) distributes traffic by using hashing based on packet contents, such as source and destination IP addresses. One advantage of SLB is that it guarantees packet order because all packets assigned to the same flow travel along the same path. However, because SLB doesn't consider real-time path or link load, it can lead to issues like poor bandwidth utilization, larger (elephant) flows disrupting smaller flows (mice flows), and traffic losses when a path goes down.
Dynamic Load Balancing (DLB) improves on SLB by selecting paths based on the bandwidth utilization of member links, and packet contents. This approach makes DLB ideal for different network conditions. DLB continuously monitors the load and queue size of each member port in an aggregate group. These metrics are processed by the DLB algorithm, which assigns each port a quality band from 0 to 7. A quality band of 7indicates the best quality, and 0indicates the lowest. This quality band assignment adjusts based on real-time port load and queue conditions.
The following key concepts are important to understanding DLB:
- Micro-Flow: A micro-flow refers to packets traversing between a source and a destination device that are associated with an individual application, and are part of the same communication session.
- Macro-Flow: A macro-flow comprises multiple micro-flows that hash to the same value at a network device. In other words, a macro-flow is an aggregation of traffic flows that share common characteristics.
- Port Load Metric: This metric represents the amount of traffic (measured in bytes) transmitted per interval over each Equal-Cost Multi-Path (ECMP) link. By monitoring this, the system can assess and distribute network load effectively.
- Port Queue Metric: This metric indicates the number of memory cells occupied while queuing at each ECMP link. It provides data about potential bottlenecks and helps in optimizing traffic flows by preventing congestion.
For more information, see Load Balancing in Data Center.
Global Load Balancing Overview
Global Load Balancing (GLB) builds on Dynamic Load Balancing (DLB) by factoring in downstream path quality when making load balancing decision. GLB gives switches the ability to detect the quality of next-next-hop (NNH) links and downstream paths. This means that upstream switches can avoid congestion by choosing optimal end-to-end paths, instead of automatically picking the least loading link, without considering link quality. GLB is supported on Juniper QFX5240 switches with Broadcom Tomahawk5 (TH5) ASICS.
The key features of GLB in Apstra 6.0 are:
-
DLB flowlet mode: This mode assigns links based on flowlets instead of flows. Flowlets are multiple bursts of the same flow separated by a period of inactivity between these bursts. This period of inactivity is referred to as the inactivity interval.
-
DLB per-packet mode: In this mode, DLB is initiated for each packet in the flow. This mode ensures that the packet always gets assigned to the best-quality member port. However, in this mode, DLB may experience packet reordering problems that can arise due to latency skews. For more information about per-packet mode, see Dynamic Load Balancing.
GLB also supports DLB with reactive load balancing.
For more information about DLB, see Dynamic Load Balancing (DLB).
For more information about GLB, see Global Load Balancing (GLB).
Prerequisites for Global Load Balancing
- Dependency on DLB:
- GLB cannot function without DLB enabled. At least one DLB mode (per-packet, reactive path, flowlet mode) must be activated.
- DLB configuration must be consistent across all routers in the fabric to ensure proper GLB operation. Inconsistent DLB setups can result in unpredictable GLB behavior.
- Fabric-wide configuration:
- GLB must be enabled on all nodes in the fabric (spines,leafs). Partial implementation or "GLB islands" are not supported because BGP’s NLRI Next-Next-Hop (NNH) capability cannot propagate DLB metrics effectively between nodes.
- While GLB policies can be tailored to individual nodes (e.g., helper-only or load-balancer-only modes), the key requirement is that GLB must be associated with the BGP protocol stack on every device.
Maximum of 1 interface between spine-leaf fabric:
ECMP between the same pair of devices is not supported. Apstra raises a blueprint warning if this occurs.
- Supported hardware and modes:
Broadcom TH5Devices: DLB Reactive Path mode is exclusive to TH5 (e.g., QFX5240 switches).
Flowlet and Per-Packet Modes: These modes are supported on all compatible devices.
We recommend that you assign only one load-balancing policyper system node. Combining multiple policies or applying dynamic policies might lead to conflicts and operational challenges.
Global Load Balancing Configuration Constraints
Before configuring GLB, note the following constraints:
- GLB is only supported on Juniper devices with TH5 ASICs (Juniper QFX5240).
- When GLB is configured, all spines and leafs in the fabric must also have a GLB policy. Validation warnings will be raised with partial policy applications.
- GLB configuration is only applied to fabric BGP peers, and will not be rendered towards external BGP peers from connectivity points.
GLB can be configured in two modes:
-
global-load-balancing helper-only:
-
Only configured for devices with the “spine” role. Apstra raises validation errors if a user attempts to configure this role for leafs or access devices.
-
The node monitors local link qualities and floods this data to neighbors. Does not have visibility into downstream link qualities or perform GLB decisions.
-
BGP advertises the next-to-next-hop (NNH) capability.
-
Usually configured on L3-Clos spines.
-
-
global-load-balancing load-balancer-only:
-
Only configured for “leaf” role devices (in any Clos); an error is raised if the policy is assigned to any other role.
-
The node does not monitor or advertise local link qualities.
-
Load-balancer-only does not advertise NNH.
-
Receives downstream link quality metrics and makes load-balancing decisions based on the combined quality of next-hops and next-next-hops.
-
To configure GLB for your AI Fabric, use the following instructions: Configure Global Load Balancing for your AI Fabric.
After you configure GLB, you can bulk-assign your load balancing policy to devices. For more information, see Bulk-Assign a Load Balancing Policy.