Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Type 5 EVPN-VXLAN implementation – Control Plane

A Type 5 EVPN-VXLAN fabric consists of a dual-layer BGP architecture made up of an underlay and an overlay.

The underlay serves as the IP transport between VXLAN Tunnel Endpoints (VTEPs)—located at the leaf nodes—and provides IP reachability using EBGP sessions. These sessions are established between directly connected leaf and spine nodes and exchange plain IPv4 or IPv6 unicast routes for the leaf nodes’ loopback interfaces.

The overlay provides IP reachability between gpu-facing ethernet segments using multihop EBGP sessions. These sessions are established between the leaf and spine nodes using their loopback addresses (IPv4 or IPv6) and include the information required to encapsulate and forward tenant traffic across the fabric while maintaining traffic separation between customers.

EBGP is preferred in overlay routing for Clos fabrics because it enforces loop-free, hop-by-hop forwarding without requiring route reflectors. By using unique ASNs per device, it aligns with Valley-Free Routing principles, ensuring traffic flows cleanly up to the spine and down to the leaf, avoiding loop and maintaining symmetry.

In a Type 5 implementation, the key routes exchanged in the overlay are EVPN Route Type 5 (IP Prefix routes), which enable the distribution of IP routing information decoupled from MAC learning. These routes carry Layer 3 prefixes along with their associated route targets, route distinguishers, and next-hop VTEP information, enabling scalable IP routing across the fabric without the need for ARP or MAC learning for every endpoint. As a result, the control plane supports efficient distribution of tenant IP routes while preserving isolation through BGP extended communities.

Fabric Underlay Control Implementation Options

There are different ways to implement the underlay in an EVPN-VXLAN fabric, depending on design goals, operational preferences, and hardware capabilities.

  • IPv4 addresses (/31 subnet masks) numbered interfaces
  • IPv6 addresses (/127 subnet masks) numbered interfaces
  • IPv6 link-local addresses, where BGP operates in unnumbered mode (as defined in RFC 5549)

Table 8. Comparison of Underlay Control Plane Implementation Options in EVPN-VXLAN Fabrics

Implementation Options IPv4 /31 IPv6 /127 IPv6 Link-Local (RFC 5549)
Leaf to Spine Interface Addressing /31 IPv4 addresses (routable) /127 IPv6 addresses Link-local IPv6 (no global addressing needed)
BGP Peer Configuration Explicit neighbor config per interface using IPv4 addresses of directly connected interface. Explicit neighbor config per interface using (routable) IPv6 addresses of directly connected interface.

No explicit neighbor config required

Uses interface-scoped link-local discovery

Link-local IPv6 + interface-scoped BGP config (fe80::1%et-0/0/0)

Benefits

Simple

Widely supported

Low config overhead

Avoids IPv4 exhaustion

IPv6-native underlay

Aligns with modern fabrics

Zero IP allocation needed

Ideal for massive fabrics

Minimal IPAM

Drawbacks No IPv6 ready Needs dual-stack address planning

Traceroute visibility reduced

RFC 5549 knowledge required

Use cases / Industry Trend

IPv4 /31 remains the most widely used in enterprise and service provider fabrics.

For most enterprise and traditional data center fabrics, IPv4 /31 remains the recommended and most straightforward underlay option.

IPv6 /127 is gaining traction in dual-stack environments and in organizations preparing for IPv6 transitions.

It conserves IPv4 space and offers a clean separation between infrastructure and tenant traffic.

IPv6 Link-Local / RFC 5549 is trending in hyperscaler, telco, and modern leaf-spine fabrics, especially where address scale or IPAM simplicity is critical.

Many cloud providers (Azure, AWS internal fabrics, Meta, etc.) use unnumbered underlays.

This option is becoming the preferred option for hyper scaling, IP-scarce, or automation-heavy fabrics with experienced ops teams, where managing IPs is a burden.

Note:

While all three options are described below, this JVD and the associated testing focused on the IPv6 Link-Local (RFC 5549) underlay, which is the preferred design choice.

Choosing IPv6 link-local addressing with RFC 5549 (unnumbered underlay) simplifies network design and reduces operational complexity—especially in large-scale fabrics where scalability and automation are key. By removing the need to assign and manage global IP addresses on leaf-to-spine links, this approach eliminates a major source of administrative overhead and prevents IPv4 address exhaustion. Each link automatically generates unique addresses, streamlining both deployment and ongoing maintenance. This is a significant advantage in environments with thousands of interfaces, where traditional IP management becomes a scaling bottleneck.

From a design and operational perspective, unnumbered underlays align perfectly with modern data center and AI fabric principles. They reduce configuration touch points, lower the chance of human error, and support dynamic, automation-friendly environments. For customers prioritizing agility, scalability, and simplified management—such as hyperscalers, telcos, and AI-driven infrastructures—IPv6 link-local fabrics not only meet today’s technical requirements but also provide a future-proof foundation that supports growth without requiring continuous IP planning. While this model requires familiarity with RFC 5549 and may introduce minor trade-offs in traceability, the operational and business benefits far outweigh the challenges for teams that are ready to embrace modern networking best practices.

Fabric Overlay Control Plane Implementation

After loopback addresses have been exchanged, EBGP sessions are formed between leaf and spine nodes. In the overlay, BGP is configured with an export policy to advertise the /31 point-to-point IP addresses used between GPU servers and leaf nodes.

Because these interfaces are part of tenant-specific IP VRFs, the prefixes are advertised as EVPN Type 5 routes, using VNI 1 for Tenant A. These routes provide L3 reachability between GPUs assigned to the same tenant, even when distributed across multiple servers and racks. The routes are installed in the corresponding VRF routing table.

EVPN Type 5 routes, also known as IP Prefix routes, are used to advertise Layer 3 (routed) prefixes across an EVPN-VXLAN fabric. Unlike Type 2 routes, which carry MAC and IP bindings for bridging or IRB use cases, Type 5 routes are designed for pure L3 routing and are ideal for deployments where no MAC learning, IRBs, or anycast gateways are required.

In this design, Type 5 routes are used to advertise the /31 point-to-point IP addresses between GPU servers and leaf switches, allowing routed connectivity between GPUs assigned to the same tenant, even when they are hosted on different servers or connected to different leaf nodes. Each Type 5 route includes the information shown in Table 9.

Table 9. EVPN Type 5 Route Fields Description

Field Description
Route Type IP Prefix Route (Type 5)
Route Distinguisher (RD) RD of advertising PE (e.g., based on loopback IP) to make the route unique across the fabric
Ethernet Tag ID 0 (because it’s not associated with a specific VLAN or MAC-VRF)
IP Prefix The advertised IPv4 or IPv6 prefix being advertised (e.g., 10.200.0.2/31)
Prefix Length The length of the IP prefix (e.g., 31)
Label VXLAN VNI (e.g., 1) identifying the virtual routing domain
Next-hop Loopback address of the advertising leaf node
Extended Community Route-target to identify the associated tenant VRF (e.g., target:65000:1)
Other BGP Attributes BGP attributes like origin, AS-path, local-pref, etc.

Control plane implementation with IPv4 underlay and IPv4 overlay

This model provides an IPv4 transport underlay and IPv4-based EVPN-VXLAN in the overlay that can still support IPv4-only devices communicating across the fabric. This model aligns with traditional IP fabric designs, where interface addressing is fully controlled and visible, and neighbor relationships are explicitly defined.

The interfaces between leaf and spine nodes are configured with explicit /31 IPv4 addresses assigned from a pool of IPv4 addresses reserved for the underlay. Each device on the point-to-point link is configured with one of the two usable IPv4 addresses in the corresponding /31 subnet. This allows efficient address assignment for the point-to-point links between leaf and spine nodes. All leaf and spine nodes are also configured with both IPv4 addresses (under lo0.0).

The underlay EBGP sessions are set up between the leaf and spine nodes, by explicitly configuring each neighbor, using the /31 IPv4 addresses assigned between them.

The EBGP configuration for this model includes each neighbor’s IPv4 address and Autonomous System (AS) number, the local Autonomous System (AS) number, and the export policy that allows the advertisement of routes to reach all the leaf and spine nodes in the fabric. These routes are standard IPv4 unicast advertising the IPv4 addresses assigned to the loopback interface (lo0.0).

The overlay EBGP sessions are also set up by explicitly configuring each neighbor, using the IPv4 addresses of the loopback interfaces advertised by the underlay EBGP sessions. These EBGP sessions are also established between the leaf and spine nodes. The leaf nodes act as VTEPs, and advertise the IPv4 prefixes assigned to the links between the GPU servers and the leaf nodes using EVPN Type 5 routes.

EXAMPLE:

Consider the example depicted in Figure 30.

For the underlay, STRIPE1 LEAF 1 in AS 201 establishes an EBGP session with SPINE 1 in AS 101, over the directly connected IPv4 link 10.2.1.2/31 <=> 10.2.1.1/31. Similarly, STRIPE2 LEAF 1 in AS 209 establishes an EBGP session with SPINE 1 over the link 10.2.9.2/31 <=> 10.2.9.1/31.

Figure 30. IPv4 underlay and IPv4 overlay example

These sessions exchange IPv4 unicast routes advertising the address of the loopback interface (lo0.0) of STRIPE1 LEAF 1 (10.0.1.1), STRIPE2 LEAF 1 (10.0.1.9) and SPINE 1 (10.0.0.1).

Note:

Although it is not shown in the diagram, STRIPE1 LEAF 1 and STRIPE2 LEAF 1 will also establish EBGP sessions with SPINE 2, SPINE 3, and SPINE 4 to ensure multiple paths are available for traffic.

EBGP sessions are established between the leaf nodes and SPINE 1 using their loopback addresses (10.0.1.1, 10.0.1.9, and 10.0.0.1, respectively).

The leaf nodes acting as VTEP advertise the links connecting the GPU servers and leaf nodes as /31 EVPN type 5 routes.

For example, STRIPE1 LEAF 1 advertises routes to the IPv4 addresses on the links connecting SERVER 1 GPU1 and SERVER 2 GPU1 to STRIPE1 LEAF 1 (10.1.1.0/31 and 10.1.1.16/31 respectively). Similarly, STRIPE2 LEAF 1 advertises router to the IPv4 addresses on the links connecting SERVER 3 GPU1 and SERVER 4 GPU1 (10.1.1.32/31 and 10.1.1.40/31 respectively).

Assuming all four GPUs in the example belong to the same tenant, their associated interfaces are mapped to the same VRF, RT5-IPVRF_TENANT-A.

RT5-IPVRF_TENANT-A is configured on both STRIPE1 LEAF 1 and STRIPE2 LEAF 1 with the same VXLAN Network Identifier (VNI) and route targets. STRIPE1 LEAF 1 advertises the prefixes 10.1.1.0/31 and 10.1.1.16/31 to SPINE 1 as EVPN Route Type 5, with its own loopback (10.0.1.1) as the next-hop VTEP. STRIPE2 LEAF 1 advertises 10.1.1.32/31 and 10.1.1.40/31 with 10.0.1.9 as the next-hop.

When SERVER 1 GPU1 sends traffic to SERVER 3 GPU1, the destination prefix 10.1.1.32/31 is matched in the VRF routing table on STRIPE1 LEAF 1. The route points to STRIPE2 LEAF 1 (VTEP at 10.0.1.9) as the next-hop, and specifies VNI 1 as the VXLAN encapsulation ID. The packet to be VXLAN-encapsulated and tunneled across the fabric to its destination over the IPv4 underlay.

Control plane implementation with IPv6 underlay and IPv6 overlay

This model provides an IPv6 transport underlay and IPv6-based EVPN-VXLAN in the overlay that can still support IPv4-only devices communicating across the fabric. This aligns with traditional dual stack IP fabric designs, where interface addressing is fully controlled and visible, and neighbor relationships are explicitly defined and both IPv4 and IPv6 is supported.

The interfaces between leaf and spine nodes are configured with explicit /127 IPv6 addresses assigned from a pool of IPv6 addresses reserved for the underlay. These addresses can be global or site local routable IPv6 addresses. Each device on the point-to-point link is configured with one of the two usable IPv6 addresses in the corresponding /127 subnet. This allows efficient address assignment for the point-to-point links between leaf and spine nodes. All leaf and spine nodes are also configured with both IPv4 and IPv6 loopback addresses (under lo0.0).

The underlay EBGP sessions are set up between the leaf and spine nodes, by explicitly configuring each neighbor, using the /127 IPv6 addresses assigned between them.

The EBGP configuration for this model includes each neighbor’s IPv6 address and Autonomous System (AS) number, the local Autonomous System (AS) number, and the export policy that allows the advertisement of routes to reach all the leaf and spine nodes in the fabric. These routes are standard IPv6 unicast advertising the IPv6 addresses assigned to the loopback interface (lo0.0).

The overlay EBGP sessions are also set up by explicitly configuring each neighbor, using the IPv6 addresses of the loopback interfaces advertised by the underlay EBGP sessions.

These EBGP sessions are also established between the leaf and spine nodes. The leaf nodes act as VTEPs, and exchange EVPN Type 5 routes advertising the IPv4 prefixes assigned to the links between the GPU servers and the leaf nodes.

EXAMPLE:

Consider the example depicted in Figure 31.

For the underlay, STRIPE1 LEAF 1 in AS 201 establishes an EBGP session with SPINE 1 in AS 101 over the directly connected IPv6 point-to-point link 2001:0:2:1::2/127 <=> 2001:0:2:1::1/127. Similarly, STRIPE2 LEAF 1 in AS 209 establishes an EBGP session with SPINE 1 over the link 2001:0:2:9::2/127 <=> 2001:0:2:9::1/127.

Figure 31. IPv6 underlay and IPv6 overlay example

These sessions exchange IPv6 unicast routes advertising the address of the loopback interface (lo0.0) of STRIPE1 LEAF 1 (2001:10::1:1), STRIPE2 LEAF 1 (2001:10::1:9) and SPINE 1 (2001:10::1).

Note:

Although it is not shown in the diagram, STRIPE1 LEAF 1 and STRIPE2 LEAF 1 will also establish EBGP sessions with SPINE 2, SPINE 3, and SPINE 4 to ensure multiple paths are available for traffic.

EBGP sessions are established between the leaf nodes and SPINE 1 using their loopback addresses (2001:10::1:1, 2001:10::1:9, and 2001:10::1 respectively).

The leaf nodes acting as VTEP advertise the links connecting the GPU servers and leaf nodes as /31 EVPN type 5 routes.

For example, STRIPE1 LEAF 1 advertises routes to the IPv4 addresses on the links connecting SERVER 1 GPU1 and SERVER 2 GPU1 to STRIPE1 LEAF 1 (10.1.1.0/31 and 10.1.1.16/31 respectively). Similarly, STRIPE2 LEAF 1 advertises router to the IPv4 addresses on the links connecting SERVER 3 GPU1 and SERVER 4 GPU1 (10.1.1.32/31 and 10.1.1.40/31 repectively).

Assuming all four GPUs in the example belong to the same tenant, their associated interfaces are mapped to the same VRF, RT5-IPVRF_TENANT-A.

RT5-IPVRF_TENANT-A is configured on both STRIPE1 LEAF 1 and STRIPE2 LEAF 1 with the same VXLAN Network Identifier (VNI) and route targets. STRIPE1 LEAF 1 advertises the prefixes 10.1.1.0/31 and 10.1.1.16/31 to SPINE 1 as EVPN Route Type 5, with its own loopback (10.0.1.1) as the next-hop VTEP. STRIPE2 LEAF 1 advertises 10.1.1.32/31 and 10.1.1.40/31 with 10.0.1.9 as the next-hop.

When SERVER 1 GPU1 sends traffic to SERVER 3 GPU1, the destination prefix 10.1.1.32/31 is matched in the VRF routing table on STRIPE1 LEAF 1. The route points to STRIPE2 LEAF 1 (VTEP at 10.0.1.9) as the next-hop, and specifies VNI 1 as the VXLAN encapsulation ID. The packet to be VXLAN-encapsulated and tunneled across the fabric to its destination over