Global Load Balancing (GLB)

Learn about GLB and how to configure GLB.

GLB Overview

Classic load balancing mechanisms use a hashing algorithm to decide the egress interface through which to send traffic. These algorithms operate the hash function on five tuples of the received packet. However, the algorithms do not consider the real-time utilization of the links through which they send packets. Even in DLB, the decision is completely local and the algorithm is unable to globally detect link utilization. If a node farther out is congested, that node might drop the packet. Global load balancing (GLB) is an enhancement to DLB that has visibility into congestion at the next-to-next-hop (NNH) level.

GLB takes into account the link utilization of remote links before deciding on the egress interface. Similarly to DLB, when one multipath leg experiences congestion, GLB can offload traffic to alternative legs to mitigate the congestion. Unlike DLB, GLB can reroute traffic flows on leaf devices to avoid traffic congestion on the spine level.

GLB is designed for Clos-based IP fabric topologies commonly used in data center deployments. Initial implementations supported three-stage Clos (leaf–spine–leaf) topologies. Recent enhancements extend GLB support to larger multi-stage Clos topologies, including five-stage architectures that introduce a super spine layer.

In large-scale AI/ML deployments, intermediate nodes such as super spines can have a high number of next-next-hop (NNH) paths. On platforms based on the QFX5240 chipset, the number of hardware path quality profiles is limited to 64. To address this limitation, GLB supports profile sharing under specific conditions, allowing reuse of path quality profiles across multiple paths.

This enhancement enables GLB to scale beyond hardware profile limits and support larger Clos networks with increased numbers of leaves, spines, and GPUs.

Use Feature Explorer to confirm platform and release support for specific features.

Benefits

Reduces packet loss due to congestion and remote link failures
Effectively load-balances large data flows in Clos topologies end-to-end to avoid congestion
Is particularly useful in deployments where large data flows increase the likelihood of traffic congestion

GLB in AI-ML Data Centers

AI-ML data centers have less entropy and larger data flows than other networks. Because hash-based load balancing does not always effectively load-balance large data flows of traffic with less entropy, dynamic load balancing (DLB) is often used instead. However, DLB takes into account only the local link bandwidth utilization. For this reason, DLB can effectively mitigate traffic congestion only on the immediate next hop. GLB more effectively load-balances large data flows by taking traffic congestion on remote links into account.

In large-scale AI/ML data center deployments, GLB is used in multi-stage Clos topologies to support increasing numbers of devices and GPUs. These topologies introduce additional path diversity, allowing GLB to make more effective load-balancing decisions across multiple network layers.

Considerations

Keep the following in mind when configuring GLB:

GLB is supported in Clos-based topologies, including three-stage and multi-stage Clos deployments. Multi-stage topologies can include additional layers such as super spines, which increase the number of available paths.
All devices participating in the GLB-enabled Clos topology must support GLB before you configure the feature.
On platforms based on the QFX5240 chipset, the number of hardware path quality profiles is limited to 64. In larger Clos topologies, such as five-stage deployments, nodes such as super spines can have more than 64 next-next-hop paths. GLB supports profile sharing under specific conditions to enable scaling beyond this limit.
GLB supports only one link between the same pair of devices (for example, a spine device and leaf device).
In large-scale AI/ML deployments, consider the size of the Clos topology and the number of available paths when designing GLB-enabled fabrics. Profile sharing enables efficient scaling but depends on topology characteristics and path distribution.

GLB does not support the following features:

Integrated routing and bridging (IRB) interfaces between top-of-rack (ToR) and spine devices
Multihomed servers
GLB for overlay routes (IPv4 or IPv6)
GLB for BGP routes learned in routing instances

Configure GLB

Configure DLB.
The DLB configuration on each device in the fabric must be identical. See Dynamic Load Balancing for how to configure DLB.
Configure a node ID for each node.
Each node must have a node ID. Keep the following in mind when configuring the node ID:
- Configure the node ID at one of these hierarchy levels:
- If you configure the bgp-identifier statement, you must configure it globally, not at a group or neighbor hierarchy level.
- The BGP identifier for each node must be unique within the fabric.
Configure GLB on spine devices based on the Clos topology.
1. For 3-Clos architectures, configure spine devices in helper-only mode.
  
  In helper-only mode, BGP sends the NNH node (NNHN) capability for the route it advertises. BGP instructs the GLB application to monitor the link qualities of all local links with EBGP sessions and flood that information to all direct neighbors.
2. For 5-Clos architectures, configure spine and super-spine devices without the helper-only or load-balancer-only option.
  
  In 5-Clos architectures, spine and super-spine devices support both helper and load-balancer modes. Do not configure the helper-only or load-balancer-only option on these devices.
On leaf devices, configure GLB in load-balancer-only mode.

In load-balancer-only mode, BGP does not send the NNHN capability for the route it advertises. The switch receives link qualities from neighboring nodes. It uses the combined link quality of next hops and NNHs to make load balancing decisions. Configure this option on the leaf devices of any Clos architecture.
Selectively disable GLB.
After you globally configure GLB using the global-load-balancing statement, you can selectively disable it on a particular BGP group or peer. To selectively disable GLB, use the no-global-load-balancing statement at either of these hierarchy levels:
For example:
Verify the configuration was successful using the following commands:
- show bgp global-load-balancing
- show bgp global-load-balancing path
- show bgp global-load-balancing path-monitor
- show bgp global-load-balancing profile

ON THIS PAGE

Global Load Balancing (GLB)

GLB Overview

Benefits

GLB in AI-ML Data Centers

Configure GLB

Considerations

Configure GLB

Related Documentation