Understanding Multichassis Link Aggregation Groups

 

Layer 2 networks are increasing in scale mainly because of technologies such as virtualization. Protocol and control mechanisms that limit the disastrous effects of a topology loop in the network are necessary. The Spanning Tree Protocol (STP) is the primary solution to this problem because it provides a loop-free Layer 2 environment. STP has gone through a number of enhancements and extensions, and even though it scales to very large network environments, it still only provides one active path from one device to another, regardless of how many actual connections might exist in the network. Although STP is a robust and scalable solution to redundancy in a Layer 2 network, the single logical link creates two problems: At least half of the available system bandwidth is off-limits to data traffic, and network topology changes occur. The Rapid Spanning Tree Protocol (RSTP) reduces the overhead of the rediscovery process and allows a Layer 2 network to reconverge faster, but the delay is still high.

Link aggregation (IEEE 802.3ad) solves some of these problems by enabling users to use more than one link connection between switches. All physical connections are considered one logical connection. The problem with standard link aggregation is that the connections are point to point.

Multichassis link aggregation groups (MC-LAGs) enable a client device to form a logical LAG interface between two MC-LAG peers. An MC-LAG provides redundancy and load balancing between the two MC-LAG peers, multihoming support, and a loop-free Layer 2 network without running STP.

On one end of an MC-LAG, there is an MC-LAG client device, such as a server, that has one or more physical links in a link aggregation group (LAG). This client device uses the link as a LAG. On the other side of the MC-LAG, there can be a maximum of two MC-LAG peers. Each of the MC-LAG peers has one or more physical links connected to a single client device.

The MC-LAG peers use the Inter-Chassis Control Protocol (ICCP) to exchange control information and coordinate with each other to ensure that data traffic is forwarded properly.

The Link Aggregation Control Protocol (LACP) is a subcomponent of the IEEE 802.3ad standard. LACP is used to discover multiple links from a client device connected to an MC-LAG peer. LACP must be configured on both MC-LAG peers for an MC-LAG to work correctly.

Note

You must specify a service identifier (service-id) at the global level; otherwise, multichassis link aggregation will not work.

Figure 1: Basic MC-LAG Topology
Basic MC-LAG Topology

The following sections provide information regarding the functional behavior of multichassis link aggregation, configuration guidelines, and best practices.

Benefits of MC-LAGs

Multichassis link aggregation groups (MC-LAGs) provide redundancy and load balancing between a maximum of two switches, multihoming support for client devices such as servers, and a loop-free Layer 2 network without running Spanning Tree Protocol (STP).

ICCP and ICL

The MC-LAG peers use the Inter-Chassis Control Protocol (ICCP) to exchange control information and coordinate with each other to ensure that data traffic is forwarded properly. ICCP replicates control traffic and forwarding states across the MC-LAG peers and communicates the operational state of the MC-LAG members. Because ICCP uses TCP/IP to communicate between the peers, the two peers must be connected to each other. ICCP messages exchange MC-LAG configuration parameters and ensure that both peers use the correct LACP parameters.

The interchassis link (ICL), also known as the interchassis link-protection link (ICL-PL), is used to forward data traffic across the MC-LAG peers. This link provides redundancy when a link failure (for example, an MC-LAG trunk failure) occurs on one of the active links. The ICL can be a single physical Ethernet interface or an aggregated Ethernet interface.

You can configure multiple ICLs between MC-LAG peers. Each ICL can learn up to 512K MAC addresses. You can configure additional ICLs for virtual switch instances.

Note

DHCP snooping, dynamic ARP inspection (DAI), and IP source guard are not supported on the ICL or MC-LAG interfaces. Consequently, incoming address resolution protocol replies on the ICL are discarded. However, ARP entries can be populated on the ICL interface through ICCP exchanges from a remote MC-LAG peer.

Best Practice

We recommend that you use separate ports and choose different Flexible PIC Concentrators (FPCs) for the interchassis link (ICL) and Inter-Chassis Control Protocol (ICCP) interfaces. Although you can use a single link for the ICCP interface, an aggregated Ethernet interface is preferred.

When configuring ICCP and ICL, we recommend that you:

  • Configure an aggregated Ethernet interface to be used for the ICL interface.

  • Configure an aggregated Ethernet interface to be used for the ICCP interface.

  • Configure the IP address for the management port (fxp0).

    When you configure backup liveness detection, this out-of-band channel is established between the peers through the management network

  • Use the peer loopback address to establish ICCP peering. Doing so avoids any direct link failure between MC-LAG peers. As long as the logical connection between the peers remains up, ICCP stays up.

  • Configure the ICCP liveness-detection interval (the Bidirectional Forwarding Detection (BFD) timer) to be at least 8 seconds if you have configured ICCP connectivity through an IRB interface. A liveness-detection interval of 8 seconds or more allows for graceful Routing Engine switchover (GRES) to work seamlessly. By default, ICCP liveness detection uses multihop BFD, which runs in centralized mode.

    This recommendation does not apply if you configured ICCP connectivity through a dedicated physical interface. In this case, you can configure single-hop BFD.

  • Configure a session establishment hold time for ICCP. This results in a faster ICCP connection establishment. The recommended value is 50 seconds.

  • Configure a hold-down timer on the ICL member links that is greater than the configured BFD timer for the ICCP interface. This prevents the ICL from being advertised as being down before the ICCP link is down. If the ICL goes down before the ICCP link, this causes a flap of the MC-LAG interface on the status-control standby node, which leads to a delay in convergence.

  • Starting with Junos OS Release 15.1 on MX Series routers, configure the backup liveness detection feature to implement faster failover of data traffic during an MC-LAG peer reboot. Configure the backup-liveness-detectionstatement on the management interface (fxp0) only.

Failure Handling

Configuring ICCP adjacency over an aggregated interface with child links on multiple FPCs mitigates the possibility of a split-brain state. A split-brain occurs when ICCP adjacency is lost between the MC-LAG peers. To work around this problem, enable backup liveness detection. With backup liveness detection enabled, the MC-LAG peers establish an out-of-band channel through the management network in addition to the ICCP channel.

During a split-brain state, both active and standby peers change LACP system IDs. Because both MC-LAG peers change the LACP system ID, the customer edge (CE) device accepts the LACP system ID of the first link that comes up and brings down other links carrying different LACP system IDs. When the ICCP connection is active, both of the MC-LAG peers use the configured LACP system ID. If the LACP system ID is changed during failures, the server that is connected over the MC-LAG removes these links from the aggregated Ethernet bundle.

When the ICL is operationally down and the ICCP connection is active, the LACP state of the links with status control configured as standby is set to the standby state. When the LACP state of the links is changed to standby, the server that is connected over the MC-LAG makes these links inactive and does not use them for sending data.

Table 1 describes the different ICCP failure scenarios for EX9200 switches. The dash means that the item is not applicable.

Table 1: ICCP Failure Scenarios for EX9200 Switches

ICCP Connection Status

ICL Status

Backup Liveness Peer Status

Action on Multichassis Aggregated Ethernet Interface with Status Set to Standby

Action on Multichassis Aggregated Ethernet Interface with Status Set to Standby and Prefer Status Control Set to Active

Down

Down or Up

Not configured

LACP system ID is changed to default value.

Not applicable. Liveness detection must be configured.

Down

Down or Up

Active

LACP system ID is changed to default value.

No change in LACP system ID.

Down

Down or Up

Inactive

No change in LACP system ID.

No change in LACP system ID.

Up

Down

LACP state is set to standby. MUX state moves to waiting state.

LACP status is set to standby. MUX state moves to waiting status.

Table 2 describes the different ICCP failure scenarios for QFX Series switches. The dash means that the item is not applicable.

Table 2: ICCP Failure Scenarios for QFX Series Switches

ICCP Connection Status

ICL Status

Backup Liveness Peer Status

Action on Multichassis Aggregated Ethernet Interface with Status Set to Standby

Down

Down or Up

Not configured

LACP system ID is changed to default value.

Down

Down or Up

Active

LACP system ID is changed to default value.

Down

Down or Up

Inactive

No change in LACP system ID.

Up

Down

LACP state is set to standby. MUX state moves to waiting state.

Configure the master-only statement on the IP address of the fxp0 interface for backup liveness detection on both the master and backup Routing Engines. This ensures that the connection is not reset during GRES in the remote peer.

For example, on the master Routing Engine:

For example, on the backup Routing Engine:

The master Routing Engine services both 10.8.2.31 and 10.8.2.33. Configure 10.8.2.33 in a backup-liveness-detection configuration on the peer node.

For example, on the backup Routing Engine:

Multichassis Link Protection

Multichassis link protection provides link protection between the two MC-LAG peers that host an MC-LAG. If the ICCP connection is up and the ICL comes up, the peer configured as standby brings up the multichassis aggregated Ethernet interfaces shared with the peer. Multichassis protection must be configured on each MC-LAG peer that is hosting an MC-LAG.

MC-AE Statement Options

The following options are available:

  • MC-AE-ID

    Specifies which MC-LAG group the aggregated Ethernet interface belongs to.

  • Redundancy groups

    Uses ICCP to associate multiple chassis that perform similar redundancy functions and to establish a communication channel so that applications on peering chassis can send messages to each other.

    Best Practice

    We recommend that you configure only one redundancy group between MC-LAG nodes. The redundancy group represents the domain of high availability between the MC-LAG nodes. One redundancy group is sufficient between a pair of MC-LAG nodes. If you are using logical systems, then configure one redundancy group between MC-LAG nodes in each logical system.

  • Init Delay Time

    Specifies the number of seconds by which to delay bringing the MC-LAG interface back to the up state when the MC-LAG peer is rebooted. By delaying the startup of the interface until after protocol convergence, you can prevent packet loss during the recovery of failed links and devices.

  • Chassis ID

    Specifies that LACP uses the chassis ID to calculate the port number of the MC-LAG physical member links. Each MC-LAG peer should have a unique chassis ID.

  • Mode

    Indicates whether an MC-LAG is in active-standby mode or active-active mode. Chassis that are in the same group must be in the same mode.

    In active-active mode, all member links are active on the MC-LAG. In this mode, media access control (MAC) addresses learned on one MC-LAG peer are propagated to the other MC-LAG peer. Active-active mode is a simple and deterministic design and is easier to troubleshoot than active-standby mode.

    Note

    Active-active mode is not supported on Dense Port Concentrator (DPC) line cards. Instead, use active-standby mode.

    In active-active MC-LAG topologies, network interfaces are categorized into three interface types, as follows:

    • S-Link—Single-homed link (S-Link) terminating on an MC-LAG peer device

    • MC-Link—MC-LAG link

    • ICL—Inter-chassis link

    Depending on the incoming and outgoing interface types, some constraints are added to the Layer 2 forwarding rules for MC-LAG configurations. The following data traffic forwarding rules apply.

    Note

    If only one MC-LAG member link is in the up state, it is considered an S-Link.

    • When an MC-LAG network receives a packet from a local MC-Link or S-Link, the packet is forwarded to other local interfaces, including S-Links and MC-Links based on the normal Layer 2 forwarding rules and on the configuration of the mesh-group and no-local-switching statements. If MC-Links and S-Links are in the same mesh group and their no-local-switching statements are enabled, the received packets are only forwarded upstream and not sent to MC-Links and S-Links.

    • The following circumstances determine whether or not an ICL receives a packet from a local MC-Link or S-Link:

      • If the peer MC-LAG network device has S-Links or MC-LAGs that do not reside on the local MC-LAG network device

      • Whether or not interfaces on two peering MC-LAG network devices are allowed to talk to each other

    • When an MC-LAG network receives a packet from the ICL, the packet is forwarded to all local S-Links and active MC-LAGs that do not exist in the MC-LAG network from which the packet was sent.

    In active-standby mode, only one of the MC-LAG peers is active at any given time. The other MC-LAG peer is in backup (standby) mode. The active MC-LAG peer uses Link Aggregation Control Protocol (LACP) to advertise to client devices that its child link is available for forwarding data traffic. Active-standby mode should be used if you are interested in redundancy only. If you require both redundancy and load sharing across member links, use active-active mode.

    Note

    Active-standby mode is not supported on EX4300 and QFX Series switches.

  • Status Control

    Specifies whether a node becomes active or goes into standby mode when an ICL failure occurs. If one node is active, the other node must be standby.

    Best Practice

    We recommend that you configure prefer-status-control-active statement with the mc-ae status-control active configuration. Do not configure the prefer-status-control-active statement with the mc-ae status-control standby configuration.

    Note

    On EX9200 and QFX Series switches, if you configure both nodes as prefer-status-control-active, you must also configure ICCP peering using the peer’s loopback address to make sure that the ICCP session does not go down because of physical link failures. Additionally, you must configure backup liveness detection on both of the MC-LAG nodes.

    Note

    On EX9200 switches, the prefer-status-control-active statement was added in Junos OS Release 13.2R1.

  • Events ICCP-Peer-Down Force-ICL-Down

    Forces the ICL to be down if the peer of this node goes down.

  • Events ICCP-Peer-Down Prefer-Status-Control-Active

    Allows the LACP system ID to be retained during a reboot, which provides better convergence after a failover.

Multichassis Link Aggregation Group (MC-LAG) Configuration Synchronization

MC-LAG configuration synchronization enables you to easily propagate, synchronize, and commit configurations from one MC-LAG peer to another. You can log into any one of the MC-LAG peers to manage both MC-LAG peers, thus having a single point of management. You can also use configuration groups to simplify the configuration process. You can create one configuration group for the local MC-LAG peer, one for the remote MC-LAG peer, and one for the global configuration, which is essentially a configuration that is common to both MC-LAG peers.

In addition, you can create conditional groups to specify when a configuration is synchronized with another MC-LAG peer. You can enable the peers-synchronize statement at the [edit system commit] hierarchy to synchronize the configurations and commits across the MC-LAG peers by default. NETCONF over SSH provides a secure connection between the MC-LAG peers, and Secure Copy Protocol (SCP) copies the configurations securely between them.

Multichassis Link Aggregation Group (MC-LAG) Configuration Consistency Check

Configuration consistency check uses the Inter-Chassis Control Protocol (ICCP) to exchange MC-LAG configuration parameters (chassis ID, service ID, and so on) and checks for any configuration inconsistencies across MC-LAG peers. An example of an inconsistency is configuring identical chassis IDs on both peers instead of configuring unique chassis IDs on both peers. When there is an inconsistency, you are notified and can take action to resolve it. Only committed MC-LAG parameters are checked for consistency.

Enhanced Convergence

Starting with Junos OS Release 14.2R3 on MX Series routers, enhanced convergence improves Layer 2 and Layer 3 convergence time when a multichassis aggregated Ethernet (MC-AE) link goes down or comes up in a bridge domain or VLAN. Starting with Junos OS Release 18.1R1, the number of vmembers has increased to 128k, and the number of ARP and ND entries has increased to 96k when enabling the enhanced-convergence statement. Enhanced convergence improves Layer 2 and Layer 3 convergence time during multichassis aggregated Ethernet (MC-AE) link failures and restoration scenarios.

When enhanced convergence is enabled, the MAC address, ARP or ND entries learned over the MC-AE interfaces are programmed in the forwarding table with the MC-AE link as the primary next-hop and with ICL as the backup next-hop. With this enhancement, during an MC-AE link failure or restoration, only the next-hop information in the forwarding table is updated and there is no flushing and relearning of the MAC address, ARP or ND entry. This process improves traffic convergence during MC-AE link failure or restoration because the convergence involves only next-hop repair in the forwarding plane, with the traffic being fast rerouted from the MC-AE link to the ICL.

If you have configured an IRB interface over an MC-AE interface that has enhanced convergences enabled, then you must configure enhanced convergence on the IRB interface as well. Enhanced convergence must be enabled for both Layer 2 and Layer 3 interfaces.

IPv6 Neighbor Discovery Protocol

Neighbor Discovery Protocol (NDP) is an IPv6 protocol that enables nodes on the same link to advertise their existence to their neighbors and to learn about the existence of their neighbors. NDP is built on top of Internet Control Message Protocol version 6 (ICMPv6). It replaces the following IPv4 protocols: Router Discovery (RDISC), Address Resolution Protocol (ARP), and ICMPv4 redirect.

You can use NDP in a multichassis link aggregation group (MC-LAG) active-active configuration on switches.

NDP on MC-LAGs uses the following message types:

  • Neighbor solicitation (NS)—Messages used for address resolution and to test reachability of neighbors.

    A host can verify that its address is unique by sending a neighbor solicitation message destined to the new address. If the host receives a neighbor advertisement in reply, the address is a duplicate.

  • Neighbor advertisement (NA)—Messages used for address resolution and to test reachability of neighbors. Neighbor advertisements are sent in response to neighbor solicitation messages.

Load Balancing

Load balancing of network traffic between MC-LAG peers is 100 percent local bias. Load balancing of network traffic between multiple LAG members in a local MC-LAG node is achieved through a standard LAG hashing algorithm.

Layer 2 Unicast Features Supported

The following Layer 2 unicast features, learning and aging, are supported:

  • Learned MAC addresses are propagated across MC-LAG peers for all of the VLANs that are spawned across the peers.

  • Aging of MAC addresses occurs when the MAC address is not seen on both of the peers.

  • MAC addresses learned on single-homed links are propagated across all of the VLANs that have MC-LAG links as members.

Note

MAC learning is disabled on the ICL. Consequently, source MAC addresses cannot be learned locally on the ICL. However, MAC addresses from a remote MC-LAG node can be installed on the ICL interface. For example, the MAC address for a single-homed client on a remote MC-LAG node can be installed on the ICL interface of the local MC-LAG node.

VLANs

Use the following best practice for configuring VLANs:

Best Practice

We recommend that you limit the scope of VLANs and configure them only where they are necessary. Configure the MC-AE trunk interfaces with only the VLANs that are necessary for the access layer. This limits the broadcast domain and reduces the STP load on aggregation and access switches.

Layer 2 Multicast Features Supported

The following Layer 2 multicast features, unknown unicast and IGMP snooping, are supported:

  • Flooding happens on all links across peers if both peers have virtual LAN membership. Only one of the peers forwards traffic on a given MC-LAG link.

  • Known and unknown multicast packets are forwarded across the peers by adding the ICL port as a multicast router port.

  • IGMP membership learned on MC-LAG links is propagated across peers.

    You must configure the multichassis-lag-replicate-state statement for Internet Group Management Protocol (IGMP) snooping to work properly in an MC-LAG environment.

  • During an MC-LAG peer reboot, known multicast traffic is flooded until the IGMP snooping state is synchronized with the peer.

IGMP Snooping on an Active-Active MC-LAG

Internet Group Management Protocol (IGMP) snooping controls multicast traffic in a switched network. When IGMP snooping is not enabled, the Layer 2 device broadcasts multicast traffic out of all of its ports, even if the hosts on the network do not want the multicast traffic. With IGMP snooping enabled, a Layer 2 device monitors the IGMP join and leave messages sent from each connected host to a multicast router. This enables the Layer 2 device to keep track of the multicast groups and associated member ports. The Layer 2 device uses this information to make intelligent decisions and to forward multicast traffic to only the intended destination hosts. IGMP uses Protocol Independent Multicast (PIM) to route the multicast traffic. PIM uses distribution trees to determine which traffic is forwarded.

Note

You must enable Protocol Independent Multicast (PIM) on the IRB interface to avoid multicast duplication.

In an active-active MC-LAG configuration, IGMP snooping replicates the Layer 2 multicast routes so that each MC-LAG peer has the same routes. If a device is connected to an MC-LAG peer by way of a single-homed interface, IGMP snooping replicates the join message to its IGMP snooping peer. If a multicast source is connected to an MC-LAG by way of a Layer 3 device, the Layer 3 device passes this information to the IRB or the routed VLAN interface (RVI) that is configured on the MC-LAG. The first hop designated router is responsible for sending the register and register-stop messages for the multicast group. The last hop designated router is responsible for sending PIM join and leave messages toward the rendezvous point and source for the multicast group. The routing device with the smallest preference metric forwards traffic on transit LANs.

Note

You must configure the ICL interface as a router-facing interface (by configuring the multicast-router-interface statement) for multicast forwarding to work in an MC-LAG environment. For the scenario in which traffic arrives by way of a Layer 3 interface, PIM and IGMP must be enabled on the IRB or RVI interface configured on the MC-LAG peers. You must enable PIM on the IRB or RVI interface to avoid multicast duplication.

VRRP Active-Standby Support

The Juniper Networks Junos operating system (Junos OS) supports active-active MC-LAGs by using VRRP in active-standby mode. VRRP in active-standby mode enables Layer 3 routing over the multichassis aggregated Ethernet interfaces on the MC-LAG peers. In this mode, the MC-LAG peers act as virtual routers. The peers share the virtual IP address that corresponds to the default route configured on the host or server connected to the MC-LAG. This virtual IP address (of the IRB or RVI interface) maps to either of the VRRP MAC addresses or to the logical interfaces of the MC-LAG peers. The host or server uses the VRRP MAC address to send any Layer 3 upstream packets. At any time, one of the VRRP devices is the master (active), and the other is a backup (standby). Usually, a VRRP backup node does not forward incoming packets. However, when VRRP over IRB or RVI is configured in an MC-LAG active-active environment, both the VRRP master and the VRRP backup forward Layer 3 traffic arriving on the multichassis aggregated Ethernet interface. If the master fails, all the traffic shifts to the multichassis aggregated Ethernet interface on the backup.

Note

You must configure VRRP on both MC-LAG peers for both the active and standby members to accept and route packets. Additionally, you must configure the VRRP backup device to send and receive ARP requests.

Routing protocols run on the primary IP address of the IRB or RVI interface, and both of the MC-LAG peers run routing protocols independently. The routing protocols use the primary IP address of the IRB or RVI interface and the IRB or RVI MAC address to communicate with the MC-LAG peers. The IRB or RVI MAC address of each MC-LAG peer is replicated on the other MC-LAG peer and is installed as a MAC address that has been learned on the ICL.

Note

If you are using the VRRP over IRB or RVI method to enable Layer 3 functionality, you must configure static ARP entries for the IRB or RVI interface of the remote MC-LAG peer to allow routing protocols to run over the IRB or RVI interfaces.

MAC Address Management

If an MC-LAG is configured to be active-active, upstream and downstream traffic could go through different MC-LAG peer devices. Because the MAC address is learned only on one of the MC-LAG peers, traffic in the reverse direction could be going through the other MC-LAG peer and flooding the network unnecessarily. Also, a single-homed client's MAC address is learned only on the MC-LAG peer that it is attached to. If a client attached to the peer MC-LAG network device needs to communicate with that single-homed client, then traffic would be flooded on the peer MC-LAG network device. To avoid unnecessary flooding, whenever a MAC address is learned on one of the MC-LAG peers, the address is replicated to the other MC-LAG peer. The following conditions are applied when MAC address replication is performed:

Note

Gratuitous ARP requests are not sent when the MAC address on the IRB or RVI interface changes.

  • MAC addresses learned on an MC-LAG of one MC-LAG peer must be replicated as learned on the same MC-LAG of the other MC-LAG peer.

  • MAC addresses learned on single-homed customer edge (CE) clients of one MC-LAG peer must be replicated as learned on the ICL interface of the other MC-LAG peer.

  • MAC address learning on an ICL is disabled from the data path. It depends on software to install MAC addresses replicated through ICCP.

If you have a VLAN without an IRB or RVI configured, MAC address replication will synchronize the MAC addresses.

MAC Aging

MAC aging support in Junos OS extends aggregated Ethernet logic for a specified MC-LAG. A MAC address in software is not deleted until all Packet Forwarding Engines have deleted the MAC address.

Address Resolution Protocol Active-Active MC-LAG Support Methodology

The Address Resolution Protocol (ARP) maps IP addresses to MAC addresses. Junos OS uses ARP response packet snooping to support active-active MC-LAGs, providing easy synchronization without the need to maintain any specific state. Without synchronization, if one MC-LAG peer sends an ARP request, and the other MC-LAG peer receives the response, ARP resolution is not successful. With synchronization, the MC-LAG peers synchronize the ARP resolutions by sniffing the packet at the MC-LAG peer receiving the ARP response and replicating this to the other MC-LAG peer. This ensures that the entries in ARP tables on the MC-LAG peers are consistent.

When one of the MC-LAG peers restarts, the ARP destinations on its MC-LAG peer are synchronized. Because the ARP destinations are already resolved, its MC-LAG peer can forward Layer 3 packets out of the multichassis aggregated Ethernet interface.

Note
  • In some cases, ARP messages received by one MC-LAG peer are replicated to the other MC-LAG peer through ICCP. This optimization feature is applicable only for ARP replies, not ARP requests, received by the MC-LAG peers.

  • Dynamic ARP resolution over the ICL interface is not supported. Consequently, incoming ARP replies on the ICL are discarded. However, ARP entries can be populated on the ICL interface through ICCP exchanges from a remote MC-LAG peer.

  • During graceful Routing Engine switchover (GRES), ARP entries that were learned remotely are purged and then learned again.

DHCP Relay with Option 82

Note

DHCP relay is not supported with MAC address synchronization. If DHCP relay is required, configure VRRP over IRB or RVI for Layer 3 functionality.

Best Practice

In an MC-LAG active-active environment, we recommend that you use the bootp relay agent by configuring the DHCP relay agent with the forwarding options helpers bootp command to avoid stale session information issues that might arise for clients when the router is using the extended DHCP relay agent (jdhcp) process.

If your environment only supports IPv6 or you must use the extended DHCP relay agent (jdhcp) process for other reasons, then as a workaround, you can configure forward-only support by using the forwarding-options dhcp-relay forward-only command for IPv4 and the forwarding-options dhcpv6 forward-only command for IPv6. You must also verify that your DHCP server in the network supports option 82.

DHCP relay with option 82 provides information about the network location of DHCP clients. The DHCP server uses this information to implement IP addresses or other parameters for the client. With DHCP relay enabled, DHCP request packets might take the path to the DHCP server through either of the MC-LAG peers. Because the MC-LAG peers have different hostnames, chassis MAC addresses, and interface names, you need to observe these requirements when you configure DHCP relay with option 82:

  • Use the interface description instead of the interface name.

  • Do not use the hostname as part of the circuit ID or remote ID string.

  • Do not use the chassis MAC address as part of the remote ID string.

  • Do not enable the vendor ID.

  • If the ICL interface receives DHCP request packets, the packets are dropped to avoid duplicate packets in the network.

    A counter called Due to received on ICL interface has been added to the show helper statistics command, which tracks the packets that the ICL interface drops.

    An example of the CLI output follows:

    user@switch> show helper statistics

    The output shows that six packets received on the ICL interface have been dropped.

MC-LAG Packet Forwarding

To prevent the server from receiving multiple copies from both of the MC-LAG peers, a block mask is used to prevent forwarding of traffic received on the ICL toward the multichassis aggregated Ethernet interface. Preventing forwarding of traffic received on the ICL interface toward the multichassis aggregated Ethernet interface ensures that traffic received on MC-LAG links is not forwarded back to the same link on the other peer. The forwarding block mask for a given MC-LAG link is cleared if all of the local members of the MC-LAG link go down on the peer. To achieve faster convergence, if all local members of the MC-LAG link are down, outbound traffic on the MC-LAG is redirected to the ICL interface on the data plane.

Layer 3 Unicast Feature Support

Layer 3 unicast feature support includes the following:

  • Address Resolution Protocol (ARP) synchronization enables ARP resolution on both of the MC-LAG peers.

  • DHCP relay with option 82 enables option 82 on the MC-LAG peers. Option 82 provides information about the network location of DHCP clients. The DHCP server uses this information to implement IP addresses or other parameters for the client.

Virtual Router Redundancy Protocol (VRRP) over IRB and MAC Address Synchronization

There are two methods for enabling Layer 3 routing functionality across a multichassis link aggregation group (MC-LAG). You can choose either to configure the Virtual Router Redundancy Protocol (VRRP) over the integrated routing and bridging (IRB) interface or to synchronize the MAC addresses for the Layer 3 interfaces of the switches participating in the MC-LAG.

Note

On EX9200 and QFX Series switches, routing protocols are not supported on the downstream clients.

Best Practice

On EX9200 and QFX Series switches, we recommend that you use MAC address synchronization for the downstream clients. For the upstream routers, we recommend that you use VRRP over IRB or RVI method.

Note

On EX9200 and QFX Series switches, you cannot configure both VRRP over IRB and MAC synchronization, because processing MAC addresses might not work.

VRRP over IRB or RVI requires that you configure different IP addresses on IRB or RVI interfaces, and run VRRP over the IRB or RVI interfaces. The virtual IP address is the gateway IP address for the MC-LAG clients.

If you are using the VRRP over IRB method to enable Layer 3 functionality, you must configure static ARP entries for the IRB interface of the remote MC-LAG peer to allow routing protocols to run over the IRB interfaces. This step is required so you can issue the ping command to reach both the physical IP addresses and virtual IP addresses of the MC-LAG peers.

For example, you can issue the set interfaces irb unit 18 family inet address 10.181.18.3/8 arp 10.181.18.2 mac 00:00:5E:00:2f:f0 command.

When you issue the show interfaces irb command after you have configured VRRP over IRB, you will see that the static ARP entries are pointing to the IRB MAC addresses of the remote MC-LAG peer:

user@switch> show interfaces irb
Note

Use MAC synchronization if you require more than 1,000 VRRP instances.

MAC address synchronization enables MC-LAG peers to forward Layer 3 packets arriving on multichassis aggregated Ethernet interfaces with either their own IRB or RVI MAC address or their peer’s IRB or RVI MAC address. Each MC-LAG peer installs its own IRB or RVI MAC address as well as the peer’s IRB or RVI MAC address in the hardware. Each MC-LAG peer treats the packet as if it were its own packet. If MAC address synchronization is not enabled, the IRB or RVI MAC address is installed on the MC-LAG peer as if it were learned on the ICL.

Note

Here are some caveats with configuring MAC address synchronization:

  • Use MAC address synchronization if you are not planning to run routing protocols on the IRB interfaces.

    MAC address synchronization does not support routing protocols on IRB interfaces, and routing protocols are not supported with downstream MC-LAG clients. If you need routing capability, configure both VRRP and routing protocols on each MC-LAG peer. Routing protocols are supported on upstream routers.

  • DHCP relay is not supported with MAC address synchronization.

    If you need to configure DHCP relay, configure VRRP over IRB.

  • Gratuitous ARP requests are not sent when the MAC address on the IRB interface changes.

MAC address synchronization requires you to configure the same IP address on the IRB interface in the VLAN on both MC-LAG peers. To enable the MAC address synchronization feature using the standard CLI, issue the set vlan vlan-name mcae-mac-synchronize command on each MC-LAG peer. If you are using the Enhanced Layer 2 CLI, issue the set bridge-domains name mcae-mac-synchronize command on each MC-LAG peer. Configure the same IP address on both MC-LAG peers. This IP address is used as the default gateway for the MC-LAG servers or hosts.

Protocol Independent Multicast

Protocol Independent Multicast (PIM) and Internet Group Management Protocol (IGMP) provide support for Layer 3 multicast. In addition to the standard mode of PIM operation, there is a special mode called PIM dual designated router. PIM dual designated router minimizes multicast traffic loss in case of failures.

If you are using Layer 3 multicast, configure the IP address on the active MC-LAG peer with a high IP address or a high designated router priority.

Note

PIM dual designated router is not supported on EX9200 and QFX10000 switches.

PIM operation is discussed in the following sections:

PIM Operation with Normal Mode Designated Router Election

In normal mode with designated router election, the IRB or RVI interfaces on both of the MC-LAG peers are configured with PIM enabled. In this mode, one of the MC-LAG peers becomes the designated router through the PIM designated router election mechanism. The elected designated router maintains the rendezvous-point tree (RPT) and shortest-path tree (SPT) so it can receive data from the source device. The elected designated router participates in periodic PIM join and prune activities toward the rendezvous point or the source.

The trigger for initiating these join and prune activities is the IGMP membership reports that are received from interested receivers. IGMP reports received over multichassis aggregated Ethernet interfaces (potentially hashing on either of the MC-LAG peers) and single-homed links are synchronized to the MC-LAG peer through ICCP.

Both MC-LAG peers receive traffic on their incoming interface (IIF). The non-designated router receives traffic by way of the ICL interface, which acts as a multicast router (mrouter) interface.

If the designated router fails, the non-designated router has to build the entire forwarding tree (RPT and SPT), which can cause multicast traffic loss.

PIM Operation with Dual Designated Router Mode

In dual designated router mode, both of the MC-LAG peers act as designated routers (active and standby) and send periodic join and prune messages upstream toward the rendezvous point, or source, and eventually join the RPT or SPT.

The primary MC-LAG peer forwards the multicast traffic to the receiver devices even if the standby MC-LAG peer has a smaller preference metric.

The standby MC-LAG peer also joins the forwarding tree and receives the multicast data. The standby MC-LAG peer drops the data because it has an empty outgoing interface list (OIL). When the standby MC-LAG peer detects the primary MC-LAG peer failure, it adds the receiver VLAN to the OIL, and starts to forward the multicast traffic.

To enable a multicast dual designated router, issue the set protocols pim interface interface-name dual-dr command on the VLAN interfaces of each MC-LAG peer.

Failure Handling

To ensure faster convergence during failures, configure the IP address on the primary MC-LAG peer with a higher IP address or with a higher designated router priority. Doing this ensures that the primary MC-LAG peer retains the designated router membership if PIM peering goes down.

To ensure that traffic converges if an MC-AE interfaces goes down, the ICL-PL interface is always added as an mrouter port. Layer 3 traffic is flooded through the default entry or the snooping entry over the ICL-PL interface, and the traffic is forwarded on the MC-AE interface on the MC-LAG peer. If the ICL-PL interface goes down, PIM neighborship goes down. In this case, both MC-LAG peers become the designated router. The backup MC-LAG peer brings down its links and the routing peering is lost. If the ICCP connection goes down, the backup MC-LAG peer changes the LACP system ID and brings down the MC-AE interfaces. The state of PIM neighbors remains operational.

Miswiring Detection Guidelines

You can use STP to detect miswiring loops within the peer or across MC-LAG peers. An example of miswiring is when a port of a network element is accidentally connected to another port of the same network element. Using STP to detect loops on MC-LAG interfaces, however, is not supported.

Note

Do not use Multiple Spanning Tree Protocol (MSTP) or VLAN Spanning Tree Protocol (VSTP). There could be a loop if MSTP or VSTP is enabled in an MC-AE topology without enabling MSTP or VSTP on the MC-AE logical interfaces. Also, there could be a loop if an alternate path exists from access nodes to MC-AE nodes.

Best Practice

To detect miswirings, we recommend that you do the following:

  • Configure STP globally so that STP can detect local miswiring within and across MC-LAG peers.

  • Disable STP on ICL links, however, because STP might block ICL interfaces and disable protection.

  • Disable STP on interfaces that are connected to aggregation switches.

  • Configure MC-LAG interfaces as edge ports.

  • Enable bridge protocol data unit (BPDU) block on edge.

  • Do not enable BPDU block on interfaces connected to aggregation switches.

For more information about BPDU block, see Understanding BPDU Protection for STP, RSTP, and MSTP.

Reverse Layer 2 Gateway Protocol (RL2GP) for Loop Prevention

With RL2GP, you can configure two edge MC-LAG nodes with the same STP virtual root ID. The virtual root ID must be superior to all bridges in the downstream network, and the downstream bridges must be capable of running STP. STP could block one of the interfaces in the downstream network and break any loop due to miswiring at the core or access layer, or due to a problem in the server software.

RL2GP must be configured on both MC-LAG nodes to prevent loops. Because both MC-LAG nodes would have the same virtual root ID, the MC-LAG interface would always be forwarding traffic. The downstream bridge would receive BPDUs from both nodes and thus receive twice the number of BPDUs on its aggregated Ethernet (AE) interface. If you do not want to receive twice the number of BPDUs, you can double the STP hello time on the virtual ID root. If both of the nodes use the same AE interface name, then the STP port number would be identical and would reduce the STP load on the downstream bridge.

MC-LAG Upgrade

Upgrade the MC-LAG peers according to the following guidelines.

Note

Upgrade both MC-LAG nodes to the same software version in order to achieve no loss during stable and failover conditions. The protocol states, data forwarding, and redundancy are guaranteed only after both nodes are upgraded to the same software version successfully.

Note

After a reboot, the multichassis aggregated Ethernet interfaces come up immediately and might start receiving packets from the server. If routing protocols are enabled, and the routing adjacencies have not been formed, packets might be dropped.

To prevent this scenario, issue the set interfaces interface-name aggregated-ether-options mc-ae init-delay-time time command to set a time by which the routing adjacencies are formed. The init-delay-time should be set to greater than or equal to 240 seconds.

  1. Make sure that both of the MC-LAG peers (node1 and node2) are in the active-active state by using the following command on any one of the MC-LAG peers:
    user@switch> show interfaces mc-ae id 1
  2. Upgrade node1 of the MC-LAG.

    When node1 is upgraded, it is rebooted, and all traffic is sent across the available LAG interfaces of node2, which is still up. The amount of traffic lost depends on how quickly the neighbor devices detect the link loss and rehash the flows of the LAG.

  3. Verify that node1 is running the software you just installed by issuing the show version command.
  4. Make sure that both nodes of the MC-LAG (node1 and node2) are in the active-active state after the reboot of node1.
  5. Upgrade node2 of the MC-LAG.

    Repeat Step 1 through Step 3 to upgrade node2.

IGMP Report Synchronization

IGMP reports received over MC-AE interfaces and single-homed links are synchronized to the MC-LAG peers. The MCSNOOPD client application on the MC-LAG peer receives the synchronization packet over ICCP and then sends a copy of the packet to the kernel using the routing socket PKT_INJECT mechanism. When the kernel receives the packet, it sends the packet to the routing protocol process (rpd) enables Layer 3 multicast protocols, like PIM and IGMP, on routed VLAN interfaces (RVIs) configured on MC-LAG VLANs.

Release History Table
Release
Description
Starting with Junos OS Release 18.1R1, the number of vmembers has increased to 128k, and the number of ARP and ND entries has increased to 96k when enabling the enhanced-convergence statement.
Starting with Junos OS Release 15.1 on MX Series routers, configure the backup liveness detection feature to implement faster failover of data traffic during an MC-LAG peer reboot.
Starting with Junos OS Release 14.2R3 on MX Series routers, enhanced convergence improves Layer 2 and Layer 3 convergence time when a multichassis aggregated Ethernet (MC-AE) link goes down or comes up in a bridge domain or VLAN.