ON THIS PAGE
Data Center Fabric Blueprint Architecture Components
This section gives an overview of the building blocks used in this blueprint architecture. The implementation of each building block technology is explored in more detail later sections.
The building blocks include:
IP Fabric Underlay Network
The modern IP fabric underlay network building block provides IP connectivity across a Clos-based topology.
As shown in Figure 1, the leaf and spine devices are interconnected using high-speed interfaces that are either single links or aggregated Ethernet interfaces. The aggregated Ethernet interfaces are optional—a single link between spine and leaf devices is typically used— but can be deployed to increase bandwidth and provide link level redundancy. Both options are covered.
We chose EBGP as the routing protocol in the underlay network for its dependability and scalability. Each spine and leaf device is assigned its own autonomous system with a unique autonomous system number to support EBGP. You can use other routing protocols in the underlay network; the usage of those protocols is beyond the scope of this document.
Micro Bidirectional Forwarding Detection (BFD)—the ability to run BFD on individual links in an aggregated Ethernet interface—can also be enabled in this building block to quickly detect link failures on any member links in aggregated Ethernet bundles that connect spine and leaf devices.
For information about implementing the IP fabric underlay network building block, see IP Fabric Underlay Network Design and Implementation.
IPv4 and IPv6 Support
Because many networks implement a dual stack environment that includes IPv4 and IPv6, this blueprint provides support for both IP protocols. IPv4 and IPv6 are interwoven throughout this guide to allow you to pick one or both of these protocols.
Network Virtualization Overlays
A network virtualization overlay is a virtual network that is transported over an IP underlay network. This building block enables multitenancy in a network, allowing you to share a single physical network across multiple tenants, while keeping each tenant’s network traffic isolated from the other tenants.
A tenant is a user community (such as a business unit, department, workgroup, or application) that contains groups of endpoints. Groups may communicate with other groups in the same tenancy, and tenants may communicate with other tenants if permitted by network policies. A group is typically expressed as a subnet (VLAN) that can communicate with other devices in the same subnet, and reach external groups and endpoints by way of a virtual routing and forwarding (VRF) instance.
As seen in the overlay example shown in Figure 2, Ethernet bridging tables (represented by triangles) handle tenant bridged frames and IP routing tables (represented by squares) process routed packets. Inter-VLAN routing happens at the integrated routing and bridging (IRB) interfaces (represented by circles). Ethernet and IP tables are directed into virtual networks (represented by colored lines). To reach end systems attached to other VXLAN Tunnel Endpoint (VTEP) devices, tenant packets are encapsulated and sent over an EVPN-signalled VXLAN tunnel (represented by green tunnel icons) to the associated remote VTEP devices. Tunneled packets are de-encapsulated at the remote VTEP devices and forwarded to the remote end systems by way of the respective bridging or routing tables of the egress VTEP device.
The following sections provide more details about overlay networks:
IBGP for Overlays
Internal BGP (IBGP) is a routing protocol that exchanges reachability information across an IP network. When IBGP is combined with Multiprotocol BGP (MP-IBGP), it provides the foundation for EVPN to exchange reachability information between VTEP devices. This capability is required to establish inter-VTEP VXLAN tunnels and use them for overlay connectivity services.
Figure 3 shows that the spine and leaf devices use their loopback addresses for peering in a single autonomous system. In this design, the spine devices act as a route reflector cluster and the leaf devices are route reflector clients. Use of a route reflector satisfies the IBGP requirement for a full mesh without the need to peer all the VTEP devices directly with one another. As a result, the leaf devices peer only with the spine devices and the spine devices peer with both spine devices and leaf devices. Because the spine devices are connected to all the leaf devices, the spine devices can relay IBGP information between the indirectly peered leaf device neighbors.
You can place route reflectors almost anywhere in the network. However, you must consider the following:
Does the selected device have enough memory and processing power to handle the additional workload required by a route reflector?
Is the selected device equidistant and reachable from all EVPN speakers?
Does the selected device have the proper software capabilities?
In this design, the route reflector cluster is placed at the spine layer. The QFX10000 line of switches in the spine role have ample processing speed to handle route reflector client traffic in the network virtualization overlay.
For details about implementing IBGP in an overlay, see Configuring IBGP for the Overlay.
The first overlay service type described in this guide is a bridged overlay, as shown in Figure 4.
In this overlay model, Ethernet VLANs are extended between leaf devices across VXLAN tunnels. These leaf-to-leaf VXLAN tunnels support data center networks that require Ethernet connectivity between leaf devices but do not need routing between the VLANs. As a result, the spine devices provide only basic underlay and overlay connectivity for the leaf devices, and do not perform routing or gateway services seen with other overlay methods.
Leaf devices originate VTEPs to connect to the other leaf devices. The tunnels enable the leaf devices to send VLAN traffic to other leaf devices and Ethernet-connected end systems in the data center. The simplicity of this overlay service makes it attractive for operators who need an easy way to introduce EVPN/VXLAN into their existing Ethernet-based data center.
You can add routing to a bridged overlay by implementing an MX Series router or SRX Series security device external to the EVPN/VXLAN fabric. Otherwise, you can select one of the other overlay types that incorporate routing (such as an edge-routed bridging overlay, a centrally-routed bridging overlay, or a routed overlay).
For information on implementing a bridged overlay, see Bridged Overlay Design and Implementation.
Centrally-Routed Bridging Overlay
The second overlay service type is the centrally-routed bridging overlay, as shown in Figure 5.
In a centrally-routed bridging overlay routing occurs at a central gateway of the data center network (the spine layer in this example) rather than at the VTEP device where the end systems are connected (the leaf layer in this example).
You can use this overlay model when you need routed traffic to go through a centralized gateway or when your edge VTEP devices lack the required routing capabilities.
As shown above, traffic that originates at the Ethernet-connected end systems is forwarded to the leaf VTEP devices over a trunk (multiple VLANs) or an access port (single VLAN). The VTEP device forwards the traffic to local end systems or to an end system at a remote VTEP device. An integrated routing and bridging (IRB) interface at each spine device helps route traffic between the Ethernet virtual networks.
EVPN supports two VLAN-aware Ethernet service models in the data center. Juniper Networks supports both models. They are as follows:
VLAN-Aware–-This bridging overlay service model allows a collection of VLANs to be easily aggregated into the same overlay virtual network. It provides two options:
Default Instance VLAN-Aware—In this option, you implement a single, default switching instance that supports a total of 4094 VLANs. All leaf platforms included in this design (QFX5100, QFX5110, QFX5200, and QFX10002) support the default instance style of VLAN-aware overlay.
To configure this service model, see Configuring a VLAN-Aware Centrally-Routed Bridging Overlay in the Default Instance.
Virtual Switch VLAN-Aware—In this option, multiple virtual switch instances support 4094 VLANs per instance. This Ethernet service model is ideal for overlay networks that require scalability beyond a single default instance. Support for this option is available currently on the QFX10000 line of switches.
To implement this scalable service model, see Configuring a VLAN-Aware Centrally-Routed Bridging Overlay with Virtual Switches.
Edge-Routed Bridging Overlay
The third overlay service option is the edge-routed bridging overlay, as shown in Figure 6.
In this Ethernet service model, the IRB interfaces are moved to leaf device VTEPs at the edge of the overlay network to bring IP routing closer to the end systems. Because of the special ASIC capabilities required to support bridging, routing, and EVPN/VXLAN in one device, edge-routed bridging overlays are only possible on certain switches, such as the QFX10000 line of switches and the QFX5110 switch.
This model allows for a simpler overall network. The spine devices are configured to handle only IP traffic, which removes the need to extend the bridging overlays to the spine devices.
This option also enables faster server-to-server, intra-data center traffic (also known as east-west traffic) where the end systems are connected to the same leaf device VTEP. As a result, routing happens much closer to the end systems than with centrally-routed bridging overlays.
For information on implementing the edge-routed bridging overlay, see Edge-Routed Bridging Overlay Design and Implementation.
IRB Addressing Models in Bridging Overlays
The configuration of IRB interfaces in centrally-routed bridging and edge-routed bridging overlays requires an understanding of the models for the default gateway IP and MAC address configuration of IRB interfaces as follows:
Unique IRB IP Address—In this model, a unique IP address is configured on each IRB interface in an overlay subnet.
The benefit of having a unique IP address and MAC address on each IRB interface is the ability to monitor and reach each of the IRB interfaces from within the overlay using its unique IP address. This model also allows you to configure a routing protocol on the IRB interface.
The downside of this model is that allocating a unique IP address to each IRB interface may consume many IP addresses of a subnet.
Unique IRB IP Address with Virtual Gateway IP Address—This model adds a virtual gateway IP address to the previous model, and we recommend it for centrally-routed bridged overlays. It is similar to VRRP, but without the in-band data plane signaling between the gateway IRB interfaces. The virtual gateway should be the same for all default gateway IRB interfaces in the overlay subnet and is active on all gateway IRB interfaces where it is configured. You should also configure a common IPv4 MAC address for the virtual gateway, which becomes the source MAC address on data packets forwarded over the IRB interface.
In addition to the benefits of the previous model, the virtual gateway simplifies default gateway configuration on end systems. The downside of this model is the same as the previous model.
IRB with Anycast IP Address and MAC Address—In this model, all default gateway IRB interfaces in an overlay subnet are configured with the same IP and MAC address. We recommend this model for edge-routed bridging overlays.
A benefit of this model is that only a single IP address is required per subnet for default gateway IRB interface addressing, which simplifies default gateway configuration on end systems.
Routed Overlay using EVPN Type 5 Routes
The final overlay option is a routed overlay, as shown in Figure 7.
This option is an IP-routed virtual network service. Unlike an MPLS-based IP VPN, the virtual network in this model is based on EVPN/VXLAN.
Cloud providers prefer this virtual network option because most modern applications are optimized for IP. Because all communication between devices happens at the IP layer, there is no need to use any Ethernet bridging components, such as VLANs and ESIs, in this routed overlay model.
For information on implementing a routed overlay, see Routed Overlay Design and Implementation.
Multihoming Support for Ethernet-Connected End Systems
Ethernet-connected multihoming allows Ethernet-connected end systems to connect into the Ethernet overlay network over a single-homed link to one VTEP device or over multiple links multihomed to different VTEP devices. Ethernet traffic is load-balanced across the fabric between VTEPs on leaf devices that connect to the same end system.
We tested setups where an Ethernet-connected end system was connected to a single leaf device or multihomed to 2 or 3 leaf devices to prove traffic is properly handled in multihomed setups with more than two leaf VTEP devices; in practice, an Ethernet-connected end system can be multihomed to a large number of leaf VTEP devices. All links are active and network traffic can be load balanced over all of the multihomed links.
In this architecture, EVPN is used for Ethernet-connected multihoming. EVPN multihomed LAGs are identified by an Ethernet segment identifier (ESI) in the EVPN bridging overlay while LACP is used to improve LAG availability.
VLAN trunking allows one interface to support multiple VLANs. VLAN trunking ensures that virtual machines (VMs) on non-overlay hypervisors can operate in any overlay networking context.
For more information about Ethernet-connected multihoming support, see Multihoming an Ethernet-Connected End System Design and Implementation.
Multihoming Support for IP-Connected End Systems
IP-connected multihoming endpoint systems to connect to the IP network over multiple IP-based access interfaces on different leaf devices.
We tested setups where an IP–connected end system was connected to a single leaf or multihomed to 2 or 3 leaf devices. The setup validated that traffic is properly handled when multihomed to multiple leaf devices; in practice, an IP-connected end system can be multihomed to a large number of leaf devices.
In multihomed setups, all links are active and network traffic is forwarded and received over all multihomed links. IP traffic is load balanced across the multihomed links using a simple hashing algorithm.
EBGP is used to exchange routing information between the IP-connected endpoint system and the connected leaf devices to ensure the route or routes to the endpoint systems are shared with all spine and leaf devices.
For more information about the IP-connected multihoming building block, see Multihoming an IP-Connected End System Design and Implementation.
Data Center Interconnect (DCI)
The data center interconnect (DCI) building block provides the technology needed to send traffic between data centers. The validated design supports DCI using EVPN Type 5 routes or IPVPN routes.
EVPN Type 5 or IPVPN routes are used in a DCI context to ensure inter-data center traffic between data centers using different IP address subnetting schemes can be exchanged. Routes are exchanged between spine devices in different data centers to allow for the passing of traffic between data centers.
Physical connectivity between the data centers is required before EVPN Type 5 messages or IPVPN routes can be sent between data centers. The physical connectivity is provided by backbone devices in a WAN cloud. A backbone device is connected to all spine devices in a single data center, as well as to the other backbone devices that are connected to the other data centers.
For information about configuring DCI, see:
In many networks, it is common for traffic to flow through separate hardware devices that each provide a service, such as firewalls, NAT, IDP, multicast, and so on. Each device requires separate operation and management. This method of linking multiple network functions can be thought of as physical service chaining.
A more efficient model for service chaining is to virtualize and consolidate network functions onto a single device. In our blueprint architecture, we are using the SRX Series routers as the device that consolidates network functions and processes and applies services. That device is called a physical network function (PNF).
In this solution, service chaining is supported on both centrally-routed bridging overlay and edge-routed bridging overlay. It works only for inter-tenant traffic.
Logical View of Service Chaining
Figure 11 shows a logical view of service chaining. It shows one spine with a right side configuration and a left side configuration. On each side is a VRF routing instance and an IRB interface. The SRX Series router in the center is the PNF, and it performs the service chaining.
The flow of traffic in this logical view is:
- The spine receives a packet on the VTEP that is in the left side VRF.
- The packet is decapsulated and sent to the left side IRB interface.
- The IRB interface routes the packet to the SRX Series router, which is acting as the PNF.
- The SRX Series router performs service chaining on the packet and forwards the packet back to the spine, where it is received on the IRB interface shown on the right side of the spine.
- The IRB interface routes the packet to the VTEP in the right side VRF.
For information about configuring service chaining, see Service Chaining Design and Implementation.
Multicast optimizations help to preserve bandwidth and more efficiently route traffic in a multicast scenario in EVPN VXLAN environments. Without any multicast optimizations configured, all multicast replication is done at the ingress of the leaf connected to the multicast source as shown in Figure 12. Multicast traffic is sent to all leaf devices that are connected to the spine. Each leaf device sends traffic to connected receivers.
There are three types of multicast optimizations supported in EVPN VXLAN environments:
For information about Multicast support, see Multicast Support in EVPN-VXLAN Overlay Networks.
For information about configuring Multicast, see Multicast Optimization in an Edge-Routed Bridging Overlay Design and Implementations.
IGMP Snooping and IGMP Proxy
IGMP snooping in an EVPN-VXLAN fabric is useful to optimize the distribution of multicast traffic. IGMP snooping preserves bandwidth because multicast traffic is forwarded only on interfaces where there are IGMP listeners. Not all interfaces on a leaf device need to receive multicast traffic.
Without IGMP snooping, end systems receive IP multicast traffic that they have no interest in, which needlessly floods their links with unwanted traffic. In some cases when IP multicast flows are large, flooding unwanted traffic causes denial-of-service issues.
Figure 13 shows how IGMP snooping works in a EVPN-VXLAN fabric. IGMP snooping is configured on all leaf devices.
- Multicast Receiver 2 sends an IGMPv2 leave request.
- Multicast Receiver 3 and 4 send an IGMPv2 join request.
- When leaf 1 receives ingress multicast traffic, it replicates it for all leaf devices, and forwards it to the spine.
- The spine forwards the traffic to all leaf devices.
- Leaf 2 receives the multicast traffic, but does not forward it to the receiver because the receiver sent an IGMP leave message.
In EVPN-VXLAN networks only IGMP version 2 is supported.
For more information about IGMP snooping, see Overview of Multicast Forwarding with IGMP Snooping in an EVPN-VXLAN Environment.
Selective Multicast Fowarding
Selective multicast Ethernet (SMET) forwarding provides greater end-to-end network efficiency and reduces traffic in the EVPN network. It conserves bandwidth usage in the core of the fabric and reduces the load on egress devices that do not have listeners.
Devices with IGMP snooping enabled use selective multicast forwarding to forward multicast traffic in an efficient way. With IGMP snooping enabled a leaf device sends multicast traffic only to the access interface with an interested receiver. With SMET added, the leaf device selectively sends multicast traffic to only the leaf devices in the core that have expressed an interest in that multicast group.
Figure 14 shows the SMET traffic flow along with IGMP snooping.
- Multicast Receiver 2 sends an IGMPv2 leave request.
- Multicast Receivers 3 and 4 send an IGMPv2 join request.
- When leaf 1 receives ingress multicast traffic, it replicates the traffic only to leaf devices with interested receivers (leaf devices 3 and 4), and forwards it to the spine.
- The spine forwards the traffic to leaf devices 3 and 4.
You do not need to enable SMET; it is enabled by default when IGMP snooping is configured on the device.
For more information about SMET, see Overview of Selective Multicast Forwarding.
Assisted Replication of Multicast Traffic
The assisted replication (AR) feature offloads EVPN-VXLAN fabric leaf devices from ingress replication tasks. The ingress leaf does not replicate multicast traffic. It sends one copy of the multicast traffic to a spine that is configured as an AR replicator device. The AR replicator device distributes and controls multicast traffic. This method conserves bandwidth in the fabric between the leaf and the spine.
Figure 15 shows how AR works along with IGMP snooping and SMET.
- Leaf 1, which is set up as the AR leaf device, receives multicast traffic and sends one copy to the spine that is set up as the AR replicator device.
- The spine replicates the multicast traffic. It replicates
traffic for leaf devices that are provisioned with the VLAN VNI in
which the multicast traffic originated from Leaf 1.
Because we have IGMP snooping and SMET configured in the network, the spine sends the multicast traffic only to leaf devices with interested receivers.
In this document, we are showing multicast optimizations on a small scale. In a full-scale network with many spines and leafs, the benefits of the optimizations are much more apparent.
Ingress Virtual Machine Traffic Optimization for EVPN
When virtual machines and hosts are moved within a data center or from one data center to another, network traffic can become inefficient if the traffic is not routed to the optimal gateway. This can happen when a host is relocated. The ARP table does not always get flushed and data flow to the host is sent to the configured gateway even when there is a more optimal gateway. The traffic is “tromboned” and routed unnecessarily to the configured gateway.
Ingress Virtual Machine Traffic Optimization (VMTO) provides greater network efficiency and optimizes ingress traffic and can eliminate the trombone effect between VLANs. When you enable ingress VMTO, routes are stored in a Layer 3 virtual routing and forwarding (VRF) table and the device routes inbound traffic directly back to host that was relocated.
Figure 16 shows tromboned traffic without ingress VMTO and optimized traffic with ingress VMTO enabled.
Without ingress VMTO, Spine 1 and 2 from DC1 and DC2 all advertise the remote IP host route 10.0.0.1 when the origin route is from DC2. The ingress traffic can be directed to either Spine 1 and 2 in DC1. It is then routed to Spine 1 and 2 in DC2 where route 10.0.0.1 was moved. This causes the tromboning effect.
With ingress VMTO, we can achieve optimal forwarding path by configuring a policy for IP host route (10.0.01) to only be advertised by Spine 1 and 2 from DC2, and not from DC1 when the IP host is moved to DC2.
For information about configuring VMTO, see Configuring VMTO.
The Dynamic Host Configuration Protocol (DHCP) relay building block allows the network to pass DHCP messages between a DHCP client and a DHCP server. The DHCP relay implementation in this building block moves DHCP packets through a centrally-routed bridging overlay where the gateway is located at the spine layer.
The DHCP server and the DHCP clients connect into the network using access interfaces on leaf devices. The DHCP server and clients can communicate with each other over the existing network without further configuration when the DHCP client and server are in the same VLAN. When a DHCP client and server are in different VLANs, DHCP traffic between the client and server is forwarded between the VLANs via the IRB interfaces on spine devices. You must configure he IRB interfaces on the spine devices to support DHCP relay between VLANs.
For information about implementing the DHCP relay, see DHCP Relay Design and Implementation.
Reducing ARP Traffic with ARP Synchronization and Suppression (Proxy ARP)
The goal of ARP synchronization is to synchronize ARP tables across all the VRFs that serve an overlay subnet to reduce the amount of traffic and optimize processing for both network devices and end systems. When an IP gateway for a subnet learns about an ARP binding, it shares it with other gateways so they do not need to discover the same ARP binding independently.
With ARP suppression, when a leaf device receives an ARP request, it checks its own ARP table that is synchronized with the other VTEP devices and responds to the request locally rather than flooding the ARP request.
Proxy ARP and ARP suppression are enabled by default on the QFX10000 line of switches.
IRB interfaces on the leaf device deliver ARP requests and NDP requests from both local and remote leaf devices. When a leaf device receives an ARP request or NDP request from another leaf device, the receiving device searches its MAC+IP address bindings database for the requested IP address.
If the device finds the MAC+IP address binding in its database, it responds to the request.
If the device does not find the MAC+IP address binding, it floods the ARP request to all Ethernet links in the VLAN and the associated VTEPs.
Because all participating leaf devices add the ARP entries and synchronize their routing and bridging tables, local leaf devices respond directly to requests from locally connected hosts and remove the need for remote devices to respond to these ARP requests.
For information about implementing the ARP synchronization, Proxy ARP, and ARP suppression, see Enabling Proxy ARP and ARP Suppression for the Edge-Routed Bridging Overlay.
Layer 2 Port Security Features on Ethernet-Connected End Systems
Centrally-routed bridging overlay and edge-routed bridging overlay supports the following security features on Layer 2 Ethernet-connected end systems:
For more information about these features, see MAC Filtering, Storm Control, and Port Mirroring Support in an EVPN-VXLAN Environment.
For information about configuring these features, see Configuring Layer 2 Port Security Features on Ethernet-Connected End Systems.
Preventing BUM Traffic Storms With Storm Control
Storm control can prevent excessive traffic from degrading the network. It lessens the impact of BUM traffic storms by monitoring traffic levels on EVPN-VXLAN interfaces, and dropping BUM traffic when a specified traffic level is exceeded.
In an EVPN-VXLAN environment, storm control monitors:
Layer 2 BUM traffic that originates in a VXLAN and is forwarded to interfaces within the same VXLAN.
Layer 3 multicast traffic that is received by an IRB interface in a VXLAN and is forwarded to interfaces in another VXLAN.
Using MAC Filtering to Enhance Port Security
MAC filtering enhances port security by limiting the number of MAC addresses that can be learned within a VLAN and therefore limit the traffic in a VXLAN. Limiting the number of MAC addresses protects the switch from flooding the Ethernet bridging table. Flooding of the Ethernet switching table occurs when the number of new MAC addresses that are learned causes the table to overflow, and previously learned MAC addresses are flushed from the table. The switch relearns the MAC addresses, which can impact performance and introduce security vulnerabilities.
In this blueprint, MAC filtering works by limiting the number of accepted packets that are sent to ingress-facing access interfaces based on MAC addresses.
Analyzing Traffic Using Port Mirroring
With analyzer-based port mirroring, you can analyze traffic down to the packet level in an EVPN-VXLAN environment. You can use this feature to enforce policies related to network usage and file sharing and to identify problem sources by locating abnormal or heavy bandwidth usage by particular stations or applications.
Port mirroring copies packets entering or exiting a port or entering a VLAN and sends the copies to a local interface for local monitoring or to a VLAN for remote monitoring. Use port mirroring to send traffic to applications that analyze traffic for purposes such as monitoring compliance, enforcing policies, detecting intrusions, monitoring and predicting traffic patterns, correlating events, and so on.