EVPN-VXLAN GPU Backend Fabric – Implementation Options for Multitenancy
Implementing GPU multitenancy in a data center requires a network architecture that ensures strong isolation, high throughput, and low latency across the shared infrastructure. This involves architectural considerations not only for the GPU backend fabric, which provides connectivity between GPUs belonging to each tenant, but also for the frontend fabric, where user access, job submission, orchestration, and authentication are handled, and the storage backend, which is responsible for delivering datasets, model checkpoints, and results to and from the GPU infrastructure. These components each require their own design strategies to ensure end-to-end performance, security, and multitenancy across the entire AI platform stack.
This JVD focuses specifically on the GPU backend fabric, which handles east-west traffic between GPUs across servers and is subject to the strictest performance and isolation requirements. EVPN-VXLAN is commonly used in this layer as the foundation for scalable multitenant environments, supporting two main design approaches: pure Type 5 services with IP-VRFs only, and VLAN-aware services with MAC-VRFs and symmetric IRB. The pure Type 5 model relies entirely on Layer 3 routing, avoiding MAC learning and simplifying both the control plane and IP address management. In contrast, the VLAN-aware model uses Layer 2 overlays to extend bridging and VLAN segmentation across the fabric. Both approaches use routed underlay designs with VXLAN encapsulation, enabling flexible resource allocation and tenant isolation across multiple physical servers.
These two approaches are summarized in table 7 for both GPU Isolation and Server Isolation.
Table 7. EVPN/VXLAN models comparison
Features | Pure RT5 EVPN-VXLAN |
VLAN-Aware EVPN/VXLAN service with MAC-VRF |
||
---|---|---|---|---|
Multi-tenancy Type |
GPU-Isolation (Per GPU multitenancy) |
Server Isolation (Per-server multitenancy) |
GPU-Isolation (Per GPU multitenancy) |
Server Isolation (Per-server multitenancy) |
GPU Assignment (Tenant Resource Allocation) |
One or more GPU (but not all) per server assigned to multiple Tenants |
All GPUs (8) per server assigned to a single Tenant |
One or more GPU (but not all) per server assigned to multiple Tenants |
All GPUs (8) per server assigned to a single Tenant |
Tenant GPU Distribution | A tenant can have one or more (but not all) GPUs on one or more servers. |
A tenant can have all the GPUs on one or more servers. |
A tenant can have one or more (but not all) GPUs on one or more servers. |
A tenant can have all the GPUs on one or more servers. |
VLANs per server⬄ Leaf node Links |
No VLANs | No VLANs | Each link is in a different VLAN and is assigned a different VNI. | Each link is in a different VLAN and is assigned a different VNI. |
Interface configuration Mode and VLAN Mapping | Access-mode interfaces, server links in different RT5_IPVRF | Access-mode interfaces, server links in different RT5_IPVRF | Access-mode interfaces, server links in different MAC-VRF | Access-mode interfaces, server links in different MAC-VRF |
IP addressing per server⬄ Leaf node Links |
Each server link is configured with /31, & /127 addresses. (8 x IP routed links) |
Each server link is configured with /31, & /127 addresses. (8 x IP routed links) |
Each server link is configured with addresses out of a /24, & /64 ranges. |
Each server link is configured with addresses out of a /24, & /64 ranges. |
VRF and Routing Instances per tenant |
One RT5_IPVRF only No MAC-VRF (on each leaf node where the GPUs assigned to tenant are connected) |
One RT5_IPVRF only No MAC-VRF (on each leaf node where the GPUs assigned to tenant are connected) |
One RT5_IPVRF & One MAC-VRF (on each leaf node where the GPUs assigned to tenant are connected) |
One RT5_IPVRF & One MAC-VRF (on each leaf node where the GPUs assigned to tenant are connected) |
VNI Allocation per Tenant |
Single VNI per tenant | Single VNI per tenant | 8 x VNIs per tenant | 8 x VNIs per tenant |
Anycast Gateway Configuration |
No Anycast Gateway (no IRB interfaces) |
No Anycast Gateway (no IRB interfaces) |
8 x Anycast IP Gateways (8 x IRB interfaces) | 8 x Anycast IP Gateways (8 x IRB interfaces) |
EVPN Service Type | Pure/Pure RT5 EVPN-VXLAN design | Pure/Pure RT5 EVPN-VXLAN | VLAN-Aware EVPN/VXLAN service (with MAC-VRF) | VLAN-Aware EVPN/VXLAN service (with MAC-VRF) |
ERB Design | No ERB | No ERB | ERB design without ESI_LAG | ERB design without ESI_LAG |
Underlay BGP Configuration | Underlay BGP Unnumbered RFC 5549 (Loopback V4 or V6) | Underlay BGP Unnumbered RFC 5549 (Loopback V4 or V6) | Underlay BGP Unnumbered RFC 5549 (Loopback V4 or V6) | Underlay BGP Unnumbered RFC 5549 (Loopback V4 or V6) |
IRB and Routing Strategy | Pure RT5 EVPN routing - no IRB interfaces | Pure RT5 EVPN routing - no IRB interfaces | Symmetric IRB – Type 5 | Symmetric IRB – Type 5 |
Congestion Control (DCQCN Type) |
Pure Type 5 DCQCN; VXLAN DCQCN | Pure Type 5 DCQCN; VXLAN DCQCN | Type 2 & 5 DCQCN; VXLAN DCQCN | Type 2 & 5 DCQCN; VXLAN DCQCN |
Pure RT5 EVPN-VXLAN - Server-Level Isolation (Per-Server Multitenancy)
In this design model, each physical server is dedicated entirely to a single tenant, meaning all GPUs (typically 8 per server) are assigned to one tenant only. This model simplifies resource allocation and isolation since there’s no sharing of GPU resources between tenants on a single server. A tenant can span across multiple servers, each of which fully belongs to that tenant.
From a networking perspective, server-to-leaf links are configured with routed IP addresses (/31 or /127), creating eight independent IP routed links. The server interfaces are placed in access mode and mapped into separate routing instances—specifically, different RT5_IPVRFs—depending on tenant allocation. Notably, VLANs are not used on these links, and there's no requirement for MAC-VRFs or anycast gateway (IRB) interfaces, adhering to a pure EVPN-VXLAN Type 5 (RT5) routing model. This ensures clean L3 separation and optimal scalability.
A key advantage of this design is the use of /31 or /127 addressing on each server link, which simplifies IP management by allowing the same addressing to remain in place regardless of which tenant is using the GPU or server. This consistency makes it easier to dynamically reassign GPUs or servers between tenants without needing to reconfigure underlying IPs. Combined with per-tenant routing instances (RT5_IPVRFs) and congestion-aware transport, this architecture provides a scalable, operationally efficient foundation for GPU multitenancy at both the server and GPU levels.
Figure 23: Pure RT5 EVPN-VXLAN - Server-Level Isolation (Per-Server Multitenancy)
Pure RT5 EVPN-VXLAN - GPU-Level Isolation (Per-GPU Multitenancy)
This model introduces finer-grained resource sharing by allowing GPUs within the same server to be allocated to different tenants. A tenant may receive one or more GPUs across one or multiple servers, but not all GPUs on any given server unless explicitly assigned. This GPU-level partitioning allows for more efficient use of server resources and is well-suited for environments with dynamic or fractional GPU demands.
Despite the increased resource-sharing granularity, the networking design remains consistent with the server-level isolation model. Interfaces are configured in access mode, mapped into tenant-specific RT5_IPVRFs without the use of VLANs. Each GPU-associated server link is still assigned a unique /31 or /127 IP address, preserving the model of eight routed links per server. The fabric continues to use pure EVPN-VXLAN Type 5 with no MAC-VRFs, IRBs, or anycast gateways involved. Underlay BGP operates in unnumbered mode (RFC 5549), and congestion control is implemented via VXLAN-aware DCQCN, ensuring fairness and traffic stability.
Like in the previous option, a key advantage of this design is the use of /31 or /127 addressing on each server link, which simplifies IP management by allowing the same addressing to remain in place regardless of which tenant is using the GPU or server. This consistency makes it easier to dynamically reassign GPUs or servers between tenants without needing to reconfigure underlying IPs. Combined with per-tenant routing instances (RT5_IPVRFs) and congestion-aware transport, this architecture provides a scalable, operationally efficient foundation for GPU multitenancy at both the server and GPU levels.
Figure 24: Pure RT5 EVPN-VXLAN – GPU Level Isolation
(Per-GPU Multitenancy)
VLAN-Aware EVPN/VXLAN -Server-Level Isolation (Per-Server Multitenancy)
In this model, each server is fully dedicated to a single tenant, with all its GPUs exclusively assigned to that tenant. This simplifies management and maintains strict tenant isolation by mapping each server’s links to dedicated VLANs and assigning distinct VNIs. A tenant can span multiple servers, but no server is shared between tenants. Each server link operates in access mode and is associated with a unique VLAN and VNI, mapped into the tenant's MAC-VRF and IP-VRF.
From a network design standpoint, this use case relies on a VLAN-Aware EVPN/VXLAN service with both MAC-VRF and IP-VRF separation per tenant. Each leaf switch hosting the tenant's servers maintains a pair of VRFs—MAC-VRF for bridging and RT5-IPVRF for routing. Addressing is assigned from larger pools (e.g., /24 for IPv4 and /64 for IPv6), and each link has its own anycast gateway (IRB) interface, resulting in 8 IRB interfaces per server. The design follows a symmetric IRB model with support for both Type 2 and Type 5 routes and uses VXLAN-aware DCQCN to manage congestion.
Figure 25: VLAN-aware EVPN-VXLAN – Server Level Isolation
(Per-Server Multitenancy)
VLAN-Aware EVPN/VXLAN - GPU-Level Isolation (Per-GPU Multitenancy)
This model enables finer-grained resource sharing by assigning individual GPUs on a server to different tenants. A single server can be shared across multiple tenants, each receiving access to specific GPUs, but not the full set. This model allows for higher utilization of compute resources while preserving isolation between tenants. Like the server-level case, each GPU-associated link is mapped to a unique VLAN and VNI, and the server interfaces operate in access mode within distinct MAC-VRFs.
From a fabric perspective, the same VLAN-Aware EVPN/VXLAN service applies, with each tenant using a dedicated MAC-VRF and IP-VRF on leaf switches where their GPUs reside. Each GPU-assigned link still receives its own address from /24 or /64 pools and its own IRB interface, maintaining a total of 8 anycast gateways per server. The design supports symmetric IRB routing and leverages both EVPN Type 2 (MAC/IP advertisements) and Type 5 (IP routes) for complete L2 and L3 connectivity. Congestion control is handled through both Type 2 and Type 5-compatible VXLAN DCQCN mechanisms, ensuring fairness across shared infrastructure.
Figure 26: VLAN-aware EVPN-VXLAN - GPU-Level Isolation (Per-GPU
Multitenancy)
Selecting the best approach
In the context of AI workloads such as training, inference, and GPU-as-a-Service (GPUaaS), the choice between a pure Type 5 and a VLAN-aware EVPN/VXLAN design can significantly impact operational efficiency. The pure Type 5 model is often better suited for large-scale AI training environments, where GPU resources are allocated in bulk—either per server or per tenant—and workloads are typically long-running and tightly coupled. Its streamlined IP-based routing, stable addressing, and minimal control-plane overhead enable predictable performance and simplified automation across hundreds or thousands of servers. In contrast, the VLAN-aware model may be more appropriate for GPUaaS platforms, inference workloads, or multi-purpose environments where tenants run shorter, independent jobs and require granular isolation, dynamic L2 connectivity, or per-interface policy enforcement. The use of MAC-VRFs and anycast gateways provides flexibility for tenant-specific services, especially in use cases involving legacy applications, bare-metal workloads, or environments that need tenant-specific IP gateways. Ultimately, both models support GPU multitenancy, but the pure Type 5 design favors scale and simplicity, while the VLAN-aware design offers flexibility and fine-grained control.
This JVD focuses on the Pure RT5 EVPN-VXLAN implementation.
Thus, the rest of the document will cover all the details for the Pure RT5 EVPN-VXLAN - Server-Level Isolation (Per-Server Multitenancy) and Pure RT5 EVPN-VXLAN - GPU-Level Isolation (Per-GPU Multitenancy) options.
EVPN-VXLAN GPU Backend Fabric for Multitenancy – Type 5 EVPN-VXLAN implementation
Tenant GPU Assignments
When the EVPN-VXLAN pure Type 5 model is implemented on GPUs that are assigned to tenants, they are mapped to IPVRF routing instances on the Leaf nodes. There will be only one IPVRF per tenant, which will be configured on all the leaf nodes when a new tenant is created. The interfaces where the GPUs assigned to this tenant are connected will be added to the VRF.
For Server Isolation, the GPUs are connected in a rail-aligned form. Thus, when a new tenant is created, and one or more servers are assigned to that tenant, there will be at least one interface on each leaf node attached to the new VRF as shown in Figure 27.
Figure 27. Server Isolation Tenant Assignments
For GPU isolation, there is also a single IPVRF per tenant, but in this case, a particular VRF is only required to be configured on the leaf nodes where GPUs assigned to that tenant are connected as shown in Figure 28.
Figure 28. GPU Isolation Tenant Assignments
However, the recommended approach is to create the new VRF on all the leaf nodes, even if no GPUs assigned to the new tenant are connected to all of them. When additional GPUs are assigned to the tenant, the only configuration change needed would be adding the new interface to the exiting VRF.
Figure 29. GPU Isolation Tenant Assignments VRF creation