Global Load Balancing (GLB)
Overview
This is an evolving feature for early adopters. More enhancements are planned in a future release.
AI-ML data centers have less entropy and larger data flows than other networks. Because hash-based load balancing does not always effectively load-balance this type of traffic, dynamic load balancing (DLB) is often used instead. However, DLB takes into account only the local link bandwidth utilization. For this reason, DLB can effectively mitigate traffic congestion only on the immediate next hop. Global load balancing (GLB) is an enhancement to DLB that has visibility into congestion at the next-to-next-hop (NNH) level. GLB more effectively load-balances large data flows by taking traffic congestion on remote links into account.
Classic load balancing mechanisms use a hashing algorithm to decide the egress interface through which to send traffic. These algorithms operate the hash function on five tuples of the received packet. However, the algorithms do not consider the real-time utilization of the links through which they send packets. Even in DLB, the decision is completely local and the algorithm is unable to globally detect link utilization. If a node farther out is congested, that node might drop the packet.
GLB takes into account the link utilization of remote links before deciding on the egress interface. Similarly to DLB, when one multipath leg experiences congestion, GLB can offload traffic to alternative legs to mitigate the congestion. Unlike DLB, GLB can reroute traffic flows on leaf devices to avoid traffic congestion on the spine level.
Benefits
-
Reduces packet loss due to congestion and remote link failures
-
Effectively load-balances large data flows in Clos topologies end-to-end to avoid congestion
-
Is particularly useful in AI-ML deployments where large data flows increase the likelihood of traffic congestion
Platform Support
See Feature Explorer for platform and release support. Starting in Junos OS Evolved Release 23.4R2, this feature is supported on these platforms:
-
QFX5240-64OD
-
QFX5240-64QD
Configuration
Prerequisites
-
You must configure DLB.
-
The DLB configuration on each device in the fabric must be identical.
-
You must configure a node ID for each node.
- Configure the node ID at one of these hierarchy levels:
[edit routing-optons router-id router-id]
or[edit protocols bgp bgp-identifier bgp-identifier
. -
If you configure the
bgp-identifier
statement, you must configure it globally, not at agroup
orneighbor
hierarchy level. -
The BGP identifier for each node must be unique within the fabric.
- Configure the node ID at one of these hierarchy levels:
Enable GLB
On spine devices:
set protocols bgp global-load-balancing helper-only set forwarding-options enhanced-hash-key ecmp-dlb <flowlet | per-packet>
On leaf devices:
set protocols bgp global-load-balancing load-balancer-only set forwarding-options enhanced-hash-key ecmp-dlb <flowlet | per-packet>
Selectively Disable GLB
After you globally configure GLB using the global-load-balancing
statement, you can selectively disable it on a particular BGP group or peer. To
selectively disable GLB, use the no-global-load-balancing
statement at either of these hierarchy levels:
[edit protocols bgp group group-name]
[edit protocols bgp group group-name neighbor address]
Implementation Notes
Configure the appropriate option for the global-load-balancing
statement at the [edit protocols bgp]
hierarchy
level:
-
helper-only
—BGP sends the NNH node (NNHN) capability for the route it advertises. BGP instructs the GLB application to monitor the link qualities of all local links with EBGP sessions and flood that information to all direct neighbors. Configure this option on the spine devices in a 3-Clos architecture. -
load-balancer-only
—BGP does not send the NNHN capability for the route it advertises. The switch receives link qualities from neighboring nodes. It uses the combined link quality of next hops and NNHs to make load balancing decisions. Configure this option on the leaf devices of any Clos architecture.
Keep the following in mind when configuring GLB:
-
GLB is supported only in a 3-Clos (leaf-spine-leaf) topology.
-
All the devices in the 3-Clos topology must support GLB before you can configure GLB.
-
The 3-Clos topology can have a maximum of 64 leaf devices when it supports GLB.
-
GLB supports only one link between the same pair of devices (for example, a spine device and leaf device).
GLB does not support the following features:
-
Integrated routing and bridging (IRB) interfaces between top-of-rack (ToR) and spine devices
-
Multihomed servers
-
GLB for overlay routes (IPv4 or IPv6)
-
GLB for BGP routes learned in routing instances
Verification and Troubleshooting
show bgp global-load-balancing show bgp global-load-balancing path show bgp global-load-balancing path-monitor show bgp global-load-balancing profile