Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding the Algorithm Used to Load Balance Traffic on MX Series Routers

When a packet is received on the ingress interface of a device, the packet forwarding engine (PFE) performs a look up to identify the forwarding next hop. If there are multiple equal-cost paths (ECMPs) to the same next-hop destination, the ingress PFE can be configured to distribute the flow between the next hops. Likewise, distribution of traffic may be required between the member links of an aggregated interface such as aggregated Ethernet. The selection of the actual forwarding next-hop is based on the hash computation result over select packet header fields and several internal fields such as interface index. You can configure some of the fields that are used by the hashing algorithm.

  • For MX series routers with Modular Port Concentrators (MPCs) and Type 5 FPCs, configure the hash for the supported traffic types at the forwarding-options enhanced-hash-key hierarchy level. Details on which fields are included by default for which traffic family can be found below.

    In Junos OS Release 18.3R1, the default method for calculating the enhanced-hash was changed to provide improved entropy for IP tunnels, IPv6 flows and PPPoE payloads transmitted as family multiservice. These defaults can be disabled by setting their respective no- commands.

  • For MX series routers with DPCs, configure the hash for the supported traffic types at the forwarding-options hash-key hierarchy level.

Junos supports different types of load balancing.

  • Per-prefix load balancing –Each prefix is mapped to only one forwarding next-hop.

  • Per-packet load balancing–All next-hop addresses for a destination in the active route are installed in the forwarding table (the term per-packet load balancing in Junos is equivalent to what other vendors may call per-flow load balancing). See Configuring Per-Packet Load Balancing for more information.

  • Random packet load balancing–Next-hops are picked randomly for each packet. This method is available on MX routers with MPC line cards for Aggregated Ethernet interfaces and ECMP paths. To configure per-packet random spray load balancing, include the per-packet statement at the [edit interfaces aex aggregated-ether-options load-balance] hierarchy level. SeeExample: Configuring Aggregated Ethernet Load Balancing for more information.

  • Per-Packet Random Spray Load Balancing –When the adaptive load-balancing option fails, per-packet random spray load balancing serves as a last resort. It ensures that the members of ECMP are equally loaded without taking bandwidth into consideration. Per packet causes packet reordering and hence is recommended only if the applications absorb reordering. Per-packet random spray eliminates traffic imbalance that occurs as a result of software errors, except for packet hash.

    Starting in Junos OS Release 20.2R1, you can configure per packet random load balancing on MX240, MX480, and MX960 routers with MPC10E (MPC10E-15C-MRATE and MPC10E-10C-MRATE) line card and MX2010 and MX2020 routers with MX2K-MPC11E line card.

  • Adaptive Load Balancing - Adaptive Load Balancing (ALB) is a method that corrects a genuine traffic imbalance by using a feedback mechanism to distribute the traffic across the links in an aggregated Ethernet bundle and on equal-cost multipath (ECMP) next hops. ALB optimizes traffic distribution when packet flows have widely varying traffic rates. ALB uses a feedback mechanism to correct traffic load imbalance by adjusting the bandwidth and packet streams on links within an AE bundle.

    • ALB on multiple Packet Forwarding Engines for aggregated Ethernet bundles

      Starting in Junos OS Release 20.1R1, on MX Series MPCs, on aggregate Ethernet Bundles ALB redistributes the traffic evenly across multiple ingress Packet Forwarding Engines (PFE) on the same line card. In earlier releases, ALB was limited to a single PFE while redistributing traffic in an AE bundle. This impacted flexibility and redundancy. ALB is disabled by default.

      You can configure ALB by setting the adaptive statement at the [edit interfaces ae-interface aggregated-ether-options load-balance] hierarchy level.

      See Configuring Adaptive Load Balancing for more information.

    • ALB on multiple PFEs for ECMP next hops

      Starting in Junos OS Release 20.1R1, you can configure ALB for ECMP next hops across multiple ingress PFEs on the same line card for even distribution of the traffic and redundancy. In earlier releases, ALB for ECMP next hops was limited to a single PFE. This limitation impacted flexibility and redundancy. ALB dynamically monitors the traffic load contributed by each flow in relation to overall ECMP link loading levels, and then takes corrective action when the threshold is reached.

      You can configure ALB for ECMP next hops by configuring the ecmp-alb command under the [edit chassis] hierarchy level.

      See ecmp-alb for more information.

    Note:

    ALB will work for multiple PFEs residing on the same line card. This feature will not be supported for PFEs residing on different line cards.

    For PFEs residing on different line cards, ingress traffic can cause an uneven load on the egress ports, even if the ALB is enabled.

Several additional configuration options are also available:

  • Per-slot hash function configuration –This method is based on a unique, load-balance hash value for each PIC slot and is only valid for M120, M320, and MX Series routers with DPCE and MS-DPC line cards.

  • Symmetrical load balancing –This method provides symmetrical load balancing on an 802.3ad LAG. The hash used for symmetrical load balancing is set at the interface level of the hierarchy. It ensures that a given flow of duplex traffic traverses the same devices in both directions, and is available on MX Series routers.

MX MPC and T-Series Type 5 FPC Specifics

The hash computation algorithm on MX MPC and T Series Type 5 FPCs produces identical results for packets with swapped layer 3 addresses or layer 4 transport ports. For example, the hash computation result for a packet with source address 192.0.2.1 and destination address 203.0.113.1 is identical to the hash computation result for a packet with source address 203.0.113.1 and destination address 192.0.2.1.

To avoid possible packet re-ordering, layer 4 transport protocol ports are never used in hash computation for fragmented IPv4 packets. This is true for the first fragment of the flow, identified by the more fragment bit in a header, and all subsequent fragments, identified by non-zero fragment offset. The first fragment and subsequent fragments are always forwarded over same next-hop.

Hashing Algorithm Used in Junos 18.3R1 and later

In most cases, including layer 3 and layer 4 field information in the hash calculation produces results that are good enough for equitable distribution for traffic. However, in cases such as IP-in-IP or GRE tunneling, layer 3 and layer 4 field information alone may not be enough to produce a hash with sufficient entropy for load balancing. For example, in a deployment where MX series routers transit GRE flows, the GRE encapsulation tunnels typically occur as a single flow with the same source and destination, and same GRE key. Fat flows can also markedly increase the imbalance in link utilization, as traffic volume over the tunnels increases. Another example is when MX PE routers are being used as VPLS PE devices in a subscriber edge deployment where the routers back-haul broadband subscriber traffic from the access devices to a central broadband network gateway (BNG). In such a case, only the subscriber MAC addresses and the BNG router MAC addresses are available for hashing. But with few BNG MACs and relatively few subscriber MACs, the typical layer 3 and layer 4 fields are not sufficient to create a hash for optimal load balancing.

Therefore, for MX series routers with Trio MPCs and running Junos OS Release 18.3R1 or later, the default enhanced-hash-key calculation has changed. A summary of the changes is listed here:

  • For GRE packets, if the outer IP packet is not a fragmented packet (first fragment or any subsequent fragment), and the inner packet is IPv4 or IPv6, then the source and destination addresses from the inner packet are used in the hash computation in addition to the outer source and destination addresses. Layer 4 ports of the inner packet are also included if the protocol of the inner IP packet is TCP or UDP, and the inner IP packet is not a fragment (first fragment or any subsequent fragment). Likewise, if the outer IP packet is not a fragment packet, and the inner packet is MPLS, then the top inner label is included in the hash computation.

  • For PPPoE packets, if the inner packet is IPv4 or IPv6, then the source and destination addresses from the inner packet are included. Layer 4 ports are included if the protocol of the inner IP packet is TCP or UDP, and the inner IP packet is not a fragment. Inclusion of the PPPoE inner packet fields can be disabled by configuring the no-payload option at the forwarding-options enhanced-hash-key family multiservice hierarchy level.

  • For IPv6, the IPv6 header flow label field is included in the hash computation. RFC 6437 describes the 20-bit flow label field in the IPv6 header. Set the no-flow-label option at the forwarding-options enhanced-hash-key family inet6 hierarchy to disable the new default.

Hash fields used for GRE traffic sent over IPv4

The lists show the fields used in the hash calculation, for non-fragmented packets, in Junos 18.3R1 and later. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.

  • IPv4, GRE

    • GRE Key

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv4 in IPv4, GRE

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

    • GRE Key

      GRE Protocol = IPv4

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv6 in IPv4, GRE

    • Payload (inner IPv6: source and destination ports, IP addresses); symmetric

    • GRE Key

      GRE Protocol = IPv6

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • MPLS in IPv4, GRE

    • Payload (inner MPLS: top label)

    • GRE Key

      GRE Protocol = MPLS

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv4, L2TPv2 used in Junos 17.2 and later

    Inclusion of the L2TPv2 tunnel ID and session ID can be enabled by configuring the forwarding-options enhanced-hash-key family inet l2tp-tunnel-session-identifier option. Note that Juniper does not recommend enabling this option by default. This is because L2TP session identification is based on the destination UDP port match (1701), and this port may not be exclusively used for L2TP transport so the extraction of the tunnel and session ID fields from the packet may not always be accurate.

    • Session ID

    • Tunnel ID

    • Source and destination port

    • Source and destination address; symmetric

    • Protocol (UDP)

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

Hash fields used for GRE traffic sent over IPv6

The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.

  • IPv6, GRE

    • GRE Key

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • IPv4 in IPv6, GRE (Junos 18.3 and later)

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

    • GRE Key

      GRE Protocol = IPv4

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • IPv6 in IPv6, GRE (Junos 18.3 and later)

    • Payload (inner IPv6: source and destination ports, IP addresses); symmetric

    • GRE Key

      GRE Protocol = IPv6

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • MPLS in IPv6, GRE (Junos 18.3 and later)

    • Payload (inner MPLS: top labels); symmetric

    • GRE Key

      GRE Protocol = MPLS

    • Source and destination address; symmetric

    • Next header

    • Flow label

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

Hash fields used for IPv4

The list shows the fields used in the hash calculation for non-fragmented packets, except where noted. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields hash is symmetric, that is, swapping the fields does not change the hash result.

  • IPv4, not TCP or UDP, or fragmented packets

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv4, TCP and UDP, non fragmented packets

    • Source and destination port; symmetric

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv4, PPTP

    • 16 least significant bits of the GRE Key

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

  • IPv4, GTP, UDP traffic to destination port 2152

    Inclusion of GPRS tunneling protocol (GTP) tunnel endpoint identifier (TEID) can be enabled at the forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier option. Note that Juniper does not recommend enabling this option by default. This is because GTP session identification is based on the destination UDP port match (2152), and this port may not be exclusively used for GTP transport, so the extraction of TEID field from the packet may not always be accurate.

    • GTP TEID (disabled)

    • Source and destination port

    • Source and destination address; symmetric

    • Protocol

    • DSCP (disabled)

    • Incoming Interface Index (disabled)

Hash fields used for IPv6

The list shows the fields used in the hash calculation for non-fragmented packets, except where noted. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields hash is symmetric, that is, swapping the fields does not change the hash result.

  • IPv6, non TCP and UDP packet, or TCP and UDP packet fragmented by the originator

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • IPv6, non fragmented TCP and UDP packet

    • Source and destination port; symmetric

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • IPv6, PPTP

    • 16 least significant bits of the GRE Key

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

  • IPv6, GTP

    Inclusion of GPRS tunneling protocol (GTP) tunnel endpoint identifier (TEID) can be enabled at the forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier hierarchy level. Note that Juniper does not recommend enabling this option by default. This is because GTP session identification is based on the destination UDP port match (2152), and this port may not be exclusively used for GTP transport, so the extraction of TEID field from the packet may not always be accurate.

    • GTP TEID (disabled by default; enable at the forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier hierarchy level.

    • Source and destination port

    • Source and destination address; symmetric

    • Next header

    • Flow label (Junos 18.3 and later)

    • Traffic class (disabled)

    • Incoming Interface Index (disabled)

Hash fields used for multiservice

Family multiservice hash configuration applies to packets entering into the router as family ccc, vpls, or bridge. The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.

  • Ethernet, non-IP or non-MPLS

    If configured, payload information is extracted from untagged packets or packets with up to two VLAN tags.

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

  • Ethernet, IPv4

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

  • Ethernet, IPv6

    • Payload (inner IPv6: source and destination ports, IP addresses); symmetric

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

  • Ethernet, MPLS

    • Payload (inner MPLS: top labels plus inner IPv4 and IPv6 fields); symmetric. See Hash fields used for MPLS, Junos 18.3 and later, below, for related information.

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

  • IPv4 in PPPoE (data packet)

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

    • PPP protocol IPv4 version 0x1, type 0x1

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

  • IPv6 in PPPoE (data packet)

    • Payload (inner IPv6: source and destination ports, IP addresses); symmetric

    • PPP protocol IPv6 version 0x1, type 0x1

    • Outer 802.1p (disabled)

    • Source and destination MAC; symmetric

    • Incoming Interface Index (disabled)

Hash fields used for MPLS, Junos 18.3 and later

The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.

  • MPLS, Encapsulated IPv4 or IPv6

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

    • Payload (inner IPv6: source and destination ports, IP addresses, next header); symmetric

    • Label 1..16 (20 bits)

    • Outer Label EXP (disabled)

    • Incoming Interface Index (disabled)

  • MPLS, IPv4 or IPv6 in Ethernet pseudo-wire

    • Payload (IPv4/IPv6 in Ethernet pseudo-wire)

    • Label 2..16 (20 bits)

    • Outer Label EXP (disabled)

    • Label 1 (20 bits)

    • Incoming Interface Index (disabled)

  • MPLS, MPLS in Ethernet pseudo-wire

    • Payload (two top labels of MPLS label stack entry in Ethernet pseudo-wire)

    • Label 2..16 (20 bits)

    • Outer Label EXP (disabled)

    • Label 1 (20 bits)

    • Incoming Interface Index (disabled)

  • MPLS, entropy label

    When an entropy label is detected, the payload field is not processed, and the indicator is not included into hash computation

    • Label 1..16 (20 bits)

    • Outer Label EXP (disabled)

    • Incoming Interface Index (disabled)

Hash fields used for MPLS from Junos 14.1 to Junos 18.3

The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.

  • MPLS, Encapsulated IPv4 or IPv6

    • Payload (inner IPv4: source and destination ports, IP addresses); symmetric

      Payload (inner IPv6: source and destination ports, IP addresses, next header); symmetric

    • Label 2.8 (20 bits)

      Outer Label EXP (disabled)

      Label 1 (20 bits)

    • Incoming Interface Index (disabled)

  • MPLS, IPv4 or IPv6 in Ethernet pseudo-wire

    • Payload (IPv4/IPv6 in Ethernet pseudo-wire)

    • Label 2.8 (20 bits)

      Outer Label EXP (disabled)

      Label 1 (20 bits)

    • Incoming Interface Index (disabled)

  • MPLS, MPLS in Ethernet pseudo-wire

    • Payload (two top labels of MPLS label stack entry in Ethernet pseudo-wire)

    • Label 2..16 (20 bits)

    • Outer Label EXP (disabled)

    • Label 1 (20 bits)

    • Incoming Interface Index (disabled)

  • MPLS, entropy label

    When an entropy label is detected, the payload field is not processed, and the indicator is not included into hash computation

    • Label 2.8 (20 bits)

      Outer Label EXP (disabled)

      Label 1 (20 bits)

    • Incoming Interface Index (disabled)

List of Junos Updates for Hash Calculation and Load Balancing for MX series routers with MPCs

Table 1: List of updates for MX series routers

Junos Release

Change

18.3R1

Includes IPv6 flow label, inner GRE header, and inner PPPoE in default hash computation.

Increases MPLS label stack depth to 16 labels.

17.2R1

Load balancing for L2TP encapsulated IPv4 and IPv6 packets.

16.1R1

Includes EoMPLS payload hash with control word.

Introduces source-only and destination-only based hashing.

15.1R1

Provides targeted distribution of static interfaces across AE member links.

Includes source, destination, and MAC of MPLS encapsulated PPPoE payload in the default hash computation.

14.2R3

Increases scaling of LAG and MC-LAG.

14.2R2

Provides aggregate Ethernet bundle with 10G, 40G and 100G links.

14.1R1

Decouples aeX interface creation from ch agg eth dev.

Increases aggregate Ethernet interface name space.

Provides adaptive load balancing for ECMP next hops.

13.3R1

Includes enhancements for adaptive, per-packet-random, and periodic-rebalance load balancing.

11.4R1

provides load sharing across ECMP next hops.