Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

How to Configure a Collapsed Spine with EVPN Multihoming

Requirements

This example assumes that you have two data centers (DC1 and DC2) with separate networks. This example uses the following devices and software:

  • DC1:

    • Two spine switches: QFX5120-48Y running Junos OS Release 18.4R2-S1.4

    • Two ToR switches: EX4300-48T running Junos OS Release 18.1R3-S6.1

    • Two security devices: SRX345 devices running Junos OS Release 18.2R3.4 (Optional add-on configuration)

    • Four servers

  • DC2:

    • Two spine switches: QFX5120-48Y running Junos OS Release 18.4R2-S1.4

    • Two ToR switches: EX4300-48T running Junos OS Release 18.1R3-S6.1

    • Two servers

Each pair of ToR switches should already be configured as a Virtual Chassis. See Understanding EX Series Virtual Chassis for more information about forming a Virtual Chassis with EX4300 switches. This example configuration uses multihoming aggregated Ethernet links between the ToR Virtual Chassis and the two spine devices on only one member in the Virtual Chassis. If possible, for better resiliency, you can connect the multihoming aggregated Ethernet links between the Virtual Chassis and the spine devices using interfaces from different Virtual Chassis members.

Overview

Use this example to configure a collapsed spine architecture with EVPN multihoming of the ToR switches. We have two data centers with an optional Data Center Interconnect (DCI) configuration, an optional SRX cluster for added security, and an optional DHCP relay configuration. This configuration example shows you how to configure this architecture in DC1. You can use a similar configuration in DC2.

Topology

In this deployment, there are two data centers: DC1 and DC2. The data center networks are configured with a collapsed spine architecture using QFX5120 as the spine switches. In this case, we recommend that you limit the EVPN-VXLAN fabric to the local data center.

You can optionally connect the data centers using Layer 3 DCI in the underlay. This use case does not require Layer 2 stretch between the data centers. Inter-data center traffic is Layer 3 only and is routed through the SRX cluster in DC1 for advanced inspection.

Figure 1 shows the logical connectivity between the components used in this NCE.

Figure 1: Logical TopologyLogical Topology

There are two tenants in DC1: JNPR1 and JNPR2. Any inter-tenant traffic between JNPR1 and JNPR2 in DC1 is routed through the SRX firewall cluster for security.

  • DC1:

    • VLANs 201 and 202 belong to JNPR1.

    • VLANs 211 and 212 belong to JNPR2.

    • DC1 has servers in VLANs 201, 202, 211, and 212.

  • DC2:

    • VLANs 221 and 222 belong to the default tenant, which is the same as the default routing instance.

    • DC2 has servers in VLANs 221 and 222.

Figure 2 shows the physical connectivity between the components used in this NCE.

Figure 2: Physical TopologyPhysical Topology

Before You Begin

You need to implement some basic configuration on your devices before you configure the fabric.

Procedure

Step-by-Step Procedure

  1. By default, no aggregated Ethernet interfaces are created. You must set the number of aggregated Ethernet interfaces before you can configure them. Once you set the device count, the system creates that number of empty aggregated Ethernet interfaces, each with a globally unique MAC address. You can create more aggregated Ethernet interfaces by increasing the device count to the number of ESI-LAG interfaces required on the device.

    Set the number of aggregated Ethernet interfaces on all spine switches and ToR switches.

  2. Ports 0 to 47 on a QFX5120-48Y operate as 10-gigabit ports by default. The SRX devices support only 1 gigabit. Configure the ports on Spine 1 and Spine 2 that are connected to the SRX device to be 1-gigabit ports. In this case, these ports are ge-0/0/10 and ge-0/0/11. To enable 1 gigabit on these ports, configure the speed of the first port in the quad, which in this case is ge-0/0/8.

    Use the following statement on Spine 1 and Spine 2:

    Note:

    You can configure the 1-gigabit and 25-gigabit port speeds only per quad (group of four ports) and not individually. All ports operate at a single speed within the quad. For instance, if you configure ports 8 through 11 to operate as 1-gigabit Ethernet ports and you insert a 10-gigabit SFP+ transceiver in port 10, an interface is not created for this port.

  3. Auto speed detection mode detects 100-gigabit Ethernet interfaces and 40-gigabit Ethernet interfaces and automatically channelizes them. Automatic channelization and speed detection are enabled by default. In this example, auto channelization would divide each 40-gigabit Ethernet interface into four 10-gigabit Ethernet interfaces.

    Disable auto channelization on ports et-0/0/2 and et-0/0/31 on Spine 3 and ports et-0/0/49 and et-0/0/50 on Spine 4 so that they remain 40-gigabit Ethernet interfaces.

    Spine 3:

    Spine 4:

Configure the Underlay

In this topology, the IP fabric is only between the two spine switches, as shown in Figure 3. The two spine switches establish EBGP peering over the point-to-point links to exchange loopback addresses with each other.

Figure 3: IP Fabric TopologyIP Fabric Topology

Configure Spine 1

Step-by-Step Procedure

  1. Configure the interfaces on Spine 1.

  2. Configure the EBGP underlay.

  3. Configure the import and export policies.

  4. Enable ECMP and ECMP fast reroute protection. Enable per-flow load balancing, which you do with the per-packet keyword.

    If a link goes down, ECMP uses fast reroute protection to shift packet forwarding to operational links, which decreases packet loss. Fast reroute protection updates ECMP sets for the interface without having to wait for the route table to update. When the next route table update occurs, a new ECMP set can be added with fewer links, or the route can point to a single next hop.

  5. By default, the ARP aging timer is set at 20 minutes and the MAC aging timer is set at 5 minutes. To avoid synchronization issues with MAC and MAC-IP binding entries in an EVPN-VXLAN environment, configure ARP aging to be faster than MAC aging.

Configure Spine 2

Step-by-Step Procedure

Repeat the configuration from Spine 1 on Spine 2.

  1. Configure the interfaces on Spine 2.

  2. Configure the EBGP underlay.

  3. Configure the import and export policies.

  4. Enable ECMP and ECMP fast reroute protection.

  5. To avoid synchronization issues with MAC and MAC-IP binding entries in an EVPN-VXLAN environment, configure ARP aging to be faster than MAC aging.

Verify the Underlay

Step-by-Step Procedure

  1. Verify that both BGP neighbor sessions are established on Spine 1.

  2. Verify that the loopback address of Spine 2 (192.168.255.12) is received by Spine 1 from both BGP neighbor sessions.

  3. Ping the loopback of the other spine device from Spine 1.

Configure the Overlay

This section shows how to configure the overlay. It includes IBGP peerings and the VLAN to VXLAN mappings for the virtual networks.

Configure Spine 1

Step-by-Step Procedure

  1. Configure IBGP peering between the Spine 1 and Spine 2 loopback addresses.

  2. Configure the VLANs and VLAN to VXLAN mapping.

  3. Configure the following switch options:

    • The virtual tunnel endpoint (VTEP) source interface. This is the loopback address on Spine 1.

    • The route distinguisher for routes generated by this device.

    • The route target.

    The route target configured under vrf-target is used by Type 1 EVPN routes. Type 2 and Type 3 EVPN routes use the auto-derived per-VNI route target for export and import.

  4. Configure the EVPN protocol. First, configure VXLAN as the data plane encapsulation for EVPN.

    Next, configure the VNIs that are part of this EVPN-VXLAN MP-BGP domain. Use set protocols evpn extended-vni-list all to configure all VNIs, or configure each VNI separately as shown below.

  5. If the data center has only two spine switches that have only BGP neighbor sessions with each other, you must disable core isolation on both spine switches. Otherwise, if a spine switch goes down, the other spine switch loses all BGP neighbor sessions, which places the ToR-facing ports into LACP standby mode and results in complete traffic loss. See How to Prevent a Split-Brain State and Understanding When to Disable EVPN-VXLAN Core Isolation for more information.

Configure Spine 2

Step-by-Step Procedure

  1. To avoid synchronization issues with MAC and MAC-IP binding entries in an EVPN-VXLAN environment, configure ARP aging to be faster than MAC aging.

  2. Configure IBGP peering.

  3. Configure the VLANs and VLAN to VXLAN mapping.

  4. Configure the following switch options.

  5. Configure the EVPN protocol.

    Next, configure the VNIs that are part of this EVPN-VXLAN MP-BGP domain. Use set protocols evpn extended-vni-list all to configure all VNIs, or configure each VNI separately as shown below.

  6. If the data center has only two spine switches that only have BGP neighbor sessions with each other, you must disable core isolation on both spine switches.

Verify the Overlay

Step-by-Step Procedure

  1. Verify the IBGP peering between Spine 1 and Spine 2 is established.

  2. Verify the source VTEP for the EVPN domain.

  3. Verify all the source VTEP and remote VTEPs.

Configure and Segment Layer 3

Configure Spine 1

Step-by-Step Procedure

  1. Configure routing and forwarding options.

    Note:

    Changing routing and forwarding options like next-hop, overlay-ecmp, or chained-composite-next-hop causes the Packet Forwarding Engine to restart, which interrupts all forwarding operations.

    • Set the number of next hops to at least the expected number of ARP entries in the overlay. See next-hop for more information about configuring vxlan-routing next-hop.

    • Enable two-level equal-cost multipath next hops using the overlay-ecmp statement. This statement is required for a Layer 3 EVPN-VXLAN overlay network when pure Type 5 routing is also configured. We strongly recommend that you configure this statement when pure Type 5 routes are enabled.

    • The chained-composite-next-hop configuration is a must for EVPN pure Type 5 with VXLAN encapsulation. Without this, the PFE will not configure the tunnel next hop.

    • Configure the router ID to be the same as the loopback IP address used as the VTEP source and the overlay BGP local address.

  2. To enable the default gateway function, configure IRB interfaces each with a unique IP address and a virtual gateway address (VGA), which must be an anycast IP address. When you specify an IPv4 address for the VGA, the Layer 3 VXLAN gateway automatically generates 00:00:5e:00:01:01 as the MAC address. This example shows you how to manually configure the virtual gateway MAC address. Configure the same virtual gateway MAC address on both spine devices for a given IRB.

    Note:

    If the VGA IP address is lower than the IRB IP address, you must use the preferred option in the IRB configuration as shown in this example.

  3. You will configure the same anycast IRB IP and MAC addresses on the IRB interfaces of each spine device. Because the spine devices act as both the spine and leaf devices in a collapsed spine architecture, they are the only devices that need to know about the IRB interfaces. Disable the advertisement of the IRB interfaces to the other devices.

  4. Place the IRBs belonging to the different tenants into their respective routing instances. This allows the IRBs in the same routing instances to share a routing table. As a result, the IRBs in a routing instance can route to each other. IRBs in different routing instances can communicate with each other either through an external security policy enforcer like SRX firewalls or if we explicitly leak routes between the routing instances.

  5. Configure Type 5 VNI for the routing instances. When setting up a routing instance for EVPN-VXLAN, you must include a loopback interface and its IP address. If you omit the loopback interface and associated IP address, EVPN control packets cannot be processed.

Configure Spine 2

Step-by-Step Procedure

  1. Configure routing and forwarding options.

    Note:

    Changing routing and forwarding options like next-hop, overlay-ecmp, or chained-composite-next-hop causes the Packet Forwarding Engine to restart, which interrupts all forwarding operations.

  2. Configure IRB.

  3. Since you have configured the same anycast IRB IP and MAC addresses on the IRB interfaces of both spine switches, disable the advertisement of the IRB interfaces to other devices.

  4. Place the IRBs belonging to the different tenants into their respective routing instances.

  5. Configure Type 5 VNI for the routing instances.

Configure EVPN Multihoming for the ToR Switches

EVPN multihoming uses ESIs. An ESI is a mandatory attribute that enables EVPN LAG server multihoming. ESI values are encoded as 10-byte integers and are used to identify a multihomed segment. The same ESI value enabled on all spine switches connected to a ToR switch forms an EVPN LAG. This EVPN LAG supports active-active multihoming towards the ToR switch.

The ToR switches (implemented as ToR Virtual Chassis in this example) use a LAG to connect to the two spine switches. As shown in Figure 4, ToR1 is connected to the spine switches with LAG ae1. This LAG on the spine switches is enabled by the EVPN multihoming feature.

Figure 4: EVPN Multihoming Configuration for ToR 1EVPN Multihoming Configuration for ToR 1

Configure Spine 1

Step-by-Step Procedure

  1. By default, aggregated Ethernet interfaces are not created. You must set the number of aggregated Ethernet interfaces on the switch before you can configure them.

  2. Configure an ESI. Set it the same on both spine switches. Enable all-active modes.

    Note:

    You can also auto-derive ESI. In this example, you manually configure ESI.

  3. Configure the LACP system ID. Set it the same on both spine switches to indicate to the ToR switches that uplinks to the two spine switches belong to the same LAG bundle. As a result, the ToR switches places the uplinks to the two spine switches in the same LAG bundle and load shares traffic across the member links.

  4. Configure the physical interface on Spine 1 connected to ToR 1 as a member of the ae1 LAG.

Configure Spine 2

Step-by-Step Procedure

  1. Set the number of aggregated Ethernet interfaces on the switch.

  2. Configure an ESI. Set it the same on both spine switches. Enable all-active modes.

  3. Configure the LACP system ID. Set it the same on both spine switches.

  4. Configure the physical interface on Spine 2 connected to ToR 1 as a member of the ae1 LAG.

Configure ToR 1

Step-by-Step Procedure

  1. By default, aggregated Ethernet interfaces are not created. You must set the number of aggregated Ethernet interfaces on the switch before you can configure them.

  2. Configure the aggregated Ethernet interfaces.

  3. Configure the VLANs.

Verify EVPN Multihoming

Step-by-Step Procedure

  1. Check the status of ae1 and the ESI associated with the LAG.

  2. Verify that the members of ae1 are collecting and distributing.

  3. Verify the status of EVPN Multihoming in the EVPN instance is Resolved on Spine 1. You can also see which spine switch is the designated forwarder for BUM traffic.

  4. Verify that all member links of the ae1 interface are collecting and distributing on ToR 1.

Configure Multihoming for the Servers

Multihome the servers to the ToR Virtual Chassis for redundancy and load sharing. The servers use LAG to connect to the two ToR Virtual Chassis member switches.

As shown in Figure 5, Endpoint 1 is connected to the ToR Virtual Chassis through LAG ae5 and belongs to the JNPR_1 tenant. Endpoint 11 is connected to the ToR Virtual Chassis through LAG ae6 and belongs to the JNPR_2 tenant.

Figure 5: Multihomed Server TopologyMultihomed Server Topology

Configure ToR 1

Step-by-Step Procedure

Since the ToR switches are configured in a Virtual Chassis, you only need to commit the configuration on the primary switch. In this example, ToR 1 is the primary switch.

  1. Configure LAG on the interfaces connected to Endpoint 1: interface xe-0/2/10 on ToR 1 and interface xe-1/2/10 on ToR 2. Endpoint 1 belongs to VLANs 201 and 202.

  2. Configure LAG on the interfaces connected to Endpoint 11. Endpoint 11 belongs to VLANs 211 and 212.

Verify Server Connectivity

Use this section to verify the servers are connected to each other through the ToR and spine switches. How you do this depends on whether they are part of the same VLAN or two different VLANs.

Note:

We recommend multihoming your servers to the ToR switches for redundancy and load sharing as described in the previous section. This section shows single-homed servers for simplicity.

Verify Intra-VLAN Server Connectivity

Step-by-Step Procedure

  1. Verify the MAC addresses of both endpoints appear in the Ethernet switching table on both the ToR switches.

  2. Verify that the two MAC addresses appear in the Ethernet Switching table on both the spine switches. The two MAC addresses are learned from the ToR switches over the LAG (ae1 and ae2) connected to each ToR switch. The MAC flags DL, DR, and DLR indicate whether traffic for the MAC address was learned locally by the spine switch, by the remote spine switch, or by both the spine switches.

  3. Verify the first MAC address is in the EVPN database on Spine 1. This output indicates that the MAC address was learned locally by this spine switch over the ESI 00:00:00:00:00:00:00:00:01:02 and LAG ae2. This MAC address is advertised in EVPN to the other spine switch.

  4. Verify the second MAC address is in the EVPN database on Spine 1. This MAC address was learned by the remote spine switch and advertised to the local spine switch over EVPN. This output also shows that this MAC address is mapped to ESI 00:00:00:00:00:00:00:00:01:01. Traffic destined for this MAC address can be switched locally to ToR 1 using the same Ethernet segment.

  5. Verify the EVPN routes on Spine 1. This output shows that these MAC addresses are advertised by the spine switches as BGP routes.

  6. Verify the EVPN routes on Spine 2. This output shows the BGP routes received the IBGP peering with Spine 1. Let us look at these routes in detail.

    The two Type 1 routes emphasized above show that Spine 1 is connected to two Ethernet Segments (ES). The ESI numbers are 0101 and 0102.

    These two routes are Type 2 routes shown above are advertised by Spine 1. They show that the two MAC addresses are reachable from Spine 1.

  7. Verify the control plane for the following MAC addresses on Spine 1.

  8. Verify the forwarding table entries for these MAC addresses on Spine 1. The following output shows that the local aggregated Ethernet interface is used for switching traffic destined for these MAC addresses.

  9. Test what happens when an uplink fails. If an uplink from ToR 1 fails, the output shows that the state at that interface is Detached.

    Figure 6 shows the topology when the interface connected to ToR 1 on Spine 1 is down.

    Figure 6: Topology When Uplink FailsTopology When Uplink Fails

    Verify that Spine 1 is now learning this MAC address from Spine 2 since Spine 1 does not have a direct connection to ToR 1.

    The forwarding table details on Spine 1 show that the traffic destined for this MAC address is sent to Spine 2.

Verify Inter-VLAN Server Connectivity

Step-by-Step Procedure

  1. On Spine 1, verify that the two MAC addresses are in different VLANs.

  2. On Spine 1, verify the ARP resolution for the two endpoints.

  3. On Spine 1, check the control plane learning for the MAC address 00:10:94:00:11:11. You can see that there is a MAC route for the MAC address and a MAC/IP route for this MAC address.

  4. Verify the forwarding table entries for these MAC addresses. Since Spine 1 is connected to both ToR switches locally, the traffic is switched locally to the corresponding ToR switch from Spine 1.

Split-Brain State

How to Prevent a Split-Brain State

Problem

If the links between the spine switches are down, causing the BGP peering to go down, both spine switches are active and forwarding. The downstream aggregated Ethernet interfaces are active and forwarding. This scenario is known as a split-brain state and can cause multiple problems.

Solution

To prevent this issue from occurring, choose one spine switch to be the standby switch. Use an event script on the standby switch that is triggered by a BGP peer down event. When the BGP peering goes down, the script checks if the other spine switch is still up using the out-of-band interface. If the other switch is reachable, the script disables the aggregated Ethernet interfaces on the standby switch. The downstream device then only uses the other spine switch to forward traffic.

We also recommend:

  • Using at least two links between the spine switches. This makes it less likely all the links between the spine switches will go down.

  • Multihoming all servers. If there is a single-homed server on one of the spine switches, the server could be unreachable.

What's Next

You have configured and verified a collapsed spine architecture for your first data center. If needed, repeat the configuration on the devices in the second data center.

Go to the next page to configure advanced security and connect your data centers.