Global Load Balancing (GLB)
Learn about GLB and how to configure GLB.
GLB Overview
Classic load balancing mechanisms use a hashing algorithm to decide the egress interface through which to send traffic. These algorithms operate the hash function on five tuples of the received packet. However, the algorithms do not consider the real-time utilization of the links through which they send packets. Even in DLB, the decision is completely local and the algorithm is unable to globally detect link utilization. If a node farther out is congested, that node might drop the packet. Global load balancing (GLB) is an enhancement to DLB that has visibility into congestion at the next-to-next-hop (NNH) level.
GLB takes into account the link utilization of remote links before deciding on the egress interface. Similarly to DLB, when one multipath leg experiences congestion, GLB can offload traffic to alternative legs to mitigate the congestion. Unlike DLB, GLB can reroute traffic flows on leaf devices to avoid traffic congestion on the spine level.
GLB is designed for Clos-based IP fabric topologies commonly used in data center deployments. Initial implementations supported three-stage Clos (leaf–spine–leaf) topologies. Recent enhancements extend GLB support to larger multi-stage Clos topologies, including five-stage architectures that introduce a super spine layer.
In large-scale AI/ML deployments, intermediate nodes such as super spines can have a high number of next-next-hop (NNH) paths. On platforms based on the QFX5240 chipset, the number of hardware path quality profiles is limited to 64. To address this limitation, GLB supports profile sharing under specific conditions, allowing reuse of path quality profiles across multiple paths.
This enhancement enables GLB to scale beyond hardware profile limits and support larger Clos networks with increased numbers of leaves, spines, and GPUs.
Use Feature Explorer to confirm platform and release support for specific features.
Benefits
-
Reduces packet loss due to congestion and remote link failures
-
Effectively load-balances large data flows in Clos topologies end-to-end to avoid congestion
-
Is particularly useful in deployments where large data flows increase the likelihood of traffic congestion
GLB in AI-ML Data Centers
AI-ML data centers have less entropy and larger data flows than other networks. Because hash-based load balancing does not always effectively load-balance large data flows of traffic with less entropy, dynamic load balancing (DLB) is often used instead. However, DLB takes into account only the local link bandwidth utilization. For this reason, DLB can effectively mitigate traffic congestion only on the immediate next hop. GLB more effectively load-balances large data flows by taking traffic congestion on remote links into account.
In large-scale AI/ML data center deployments, GLB is used in multi-stage Clos topologies to support increasing numbers of devices and GPUs. These topologies introduce additional path diversity, allowing GLB to make more effective load-balancing decisions across multiple network layers.
Configure GLB
Considerations
Keep the following in mind when configuring GLB:
-
GLB is supported in Clos-based topologies, including three-stage and multi-stage Clos deployments. Multi-stage topologies can include additional layers such as super spines, which increase the number of available paths.
-
All devices participating in the GLB-enabled Clos topology must support GLB before you configure the feature.
-
On platforms based on the QFX5240 chipset, the number of hardware path quality profiles is limited to 64. In larger Clos topologies, such as five-stage deployments, nodes such as super spines can have more than 64 next-next-hop paths. GLB supports profile sharing under specific conditions to enable scaling beyond this limit.
-
GLB supports only one link between the same pair of devices (for example, a spine device and leaf device).
-
In large-scale AI/ML deployments, consider the size of the Clos topology and the number of available paths when designing GLB-enabled fabrics. Profile sharing enables efficient scaling but depends on topology characteristics and path distribution.
GLB does not support the following features:
-
Integrated routing and bridging (IRB) interfaces between top-of-rack (ToR) and spine devices
-
Multihomed servers
-
GLB for overlay routes (IPv4 or IPv6)
-
GLB for BGP routes learned in routing instances