Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Configure GLB on 3-Clos IP Fabric with Multilinks

In a Clos network, congestion on the first two next hops impacts the load balancing decisions of the local node and the previous hop nodes and triggers global load balancing (GLB). We support GLB on 3-Stage Clos topologies with multilink between spine and top-of-rack switches.

Dynamic load balancing (DLB) helps to avoid congested links to mitigate local congestion. However, DLB cannot address some congestion experienced by remote devices in the network. In these cases, global load balancing (GLB) extends DLB by modulating the local path selection using path quality perceived at downstream switches to mitigate congestion. GLB allows an upstream switch to avoid downstream congestion hotspots and select a better end-to-end path. In a Clos network, congestion on the first two next hops impacts the load balancing decisions of the local node and the previous hop nodes and triggers GLB. If the route has only one next-next-hop, a simple path quality profile is created. If the route has more than one next next-hop node, then a simple path quality profile is created for each next next-hop node.

Profile Sharing in Clos Network

In extensive Clos network configurations with many GPUs, the profile-sharing mechanism optimizes limited profile resources. In Clos networks with five or more stages, some nodes, like super spines exceed 64 next-next-hop nodes. We can reuse the profiles under specific conditions to support more than 64 next-next-hop nodes. When managing large-scale network fabrics, particularly in hyperscale AI/ML environments, efficiently utilizing the profile capacity is critical. The profile-sharing feature allows two next-next-hop nodes to use a single profile if their paths do not overlap. Ensuring non-overlapping paths maintains efficient routing without exceeding constraints, maximizing resource utilization.

Profile sharing involves stringent criteria to prevent configuration conflicts and performance degradation. Nodes must operate on distinct next-hop link ranges to ensure appropriate use of shared profiles, maintaining network robustness. Understanding these criteria is essential, as misconfiguration could lead to suboptimal routing or instability.

Note:

Even with profile sharing, the total number of simple profiles must not be more than 1024. Maximum number of nexthops (paths) a HW profile can have is 352.

Transition from one profile space allocation to another profile space allocation might lead to over 64 hardware profiles in PFE during the transition. We strongly recommend to deactivate bgp global-load-balancing and wait until all profiles are cleared from PFE before changing the profile space allocation.

Benefits of GLB in 3-Clos Networks with Multilinks

  • Mitigates congestion when AI-ML traffic that has elephant flows and lacks entropy causes congestion in the fabric.

  • Efficient traffic distribution to ensure optimum link utilization. In a DC fabric, hashing is unable to ensure even load distribution over all ECMP links, which might result in underutilization of some links.

  • Reduces packet loss in case of remote link failures.

In Figure 1, S1 and S2 are spine nodes connecting to T1 and T2 top of rack (ToR) devices with multiple links a, b, c, d and p, q, r, s. S1 and S2 aggregate the quality of all available paths to a remote device and advertises the overall path quality to ToR devices. 1.1 to 1.n and 2.1 to 2.n are the hosts or routes behind the ToR devices T1 and T2 respectively. If one or more links go down, the spine continues to apply same aggregation logic to the remaining active links. The remote link state is only advertised as ‘down’ when all links in the multilink group are down.

Figure 1: GLB on 3 Clos IP Fabric with Multilinks GLB on 3 Clos IP Fabric with Multilinks
To configure GLB in a network with multiple paths between spine and top-of-rack switches on a 3-Clos IP fabric:
  1. Configure a router ID for each node. Assign a BGP identifier if you prefer not to use the router-id as a GLB node ID.
    Note: If you change the router-id on a node configured with GLB, connected BGP neighbors are not updated with this change until you clear the BGP session on these peers.
  2. Enable DLB on spine and leaf nodes in either flowlet or per-packet mode as per your network requirements.

    Configure DLB on each router in the fabric. To achieve effective GLB, the DLB configuration on each router in the fabric must be identical.

  3. Enable GLB on spine and leaf devices. GLB must be configured on each router in the fabric.
    1. On spine devices, configure GLB in helper-only mode.

      In the helper mode, the node monitors the link quality.

    2. On leaf devices, configure GLB in load-balancer-only mode.

      In the load-balancer mode, GLB only receives link qualities from neighboring nodes and uses the combined link quality of next-hops and next-next-hops to make load balancing decisions.

  4. Disable GLB on selected BGP peers or BGP groups.
  5. On spine devices, enable the GLB multilink mode. A spine can aggregate the link qualities of both the links to top-of-rack devices and send it to them. The aggregation is calculated in two ways:
    1. Maximum of the active local links—Use this option if the network has links with different speeds.

    2. Average of the local active links—Use this option if the speed of links is the same across devices. By default, the spine advertises the average quality of all the links.

    Note: When spine to top-of-rack devices are connected through multiple links, GLB multilink mode is enabled by default in average mode. If you manually change the GLB multilink mode, you must turn the power off and restart the PFE.
  6. Verify the configuration using the following commands.
    • show global-load-balancing monitor-links to display the details of all monitored links.

    • show bgp global-load-balancing path-monitor to display all path monitors that BGP has created and their installation status.

    • show bgp global-load-balancing profile on the leaf switch to display all GLB profiles and their installation status.