Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

EVPN/VXLAN GPU Backend Fabric for Multitenancy – Type 5 EVPN/VXLAN Implementation

Tenant Separation

Preserving tenant separation requires careful design at two levels:

  • Fabric Tenant Separation – isolation of traffic across the fabric
  • Internal Server Separation – isolation of GPU access within each server

Fabric Tenant Separation

Across the fabric, separation is achieved by implementing EVPN/VXLAN pure Type 5, where the interfaces connecting the GPUs assigned to tenants are mapped to distinct IP-VRF routing instances on the leaf nodes. Fabric Tenant Separation is implemented slightly differently for the Server Isolation, and GPU Isolation models.

For Server Isolation:

When a new tenant is onboarded and assigned one or more servers, a dedicated IP-VRF routing instance is created for that tenant on each leaf node within a stripe. The interfaces of the assigned servers are then added to this VRF. Because GPU servers are connected in a rail-optimized topology, at least one interface on each leaf node is typically part of the new VRF, as illustrated in Figure 39.

In the example, Tenant A is assigned Servers 1 and 4, a VRF is instantiated on Leaf nodes 1 through 8. All the interfaces on these two servers are associated with the VRF based on the rail-aligned connectivity model, resulting in two interfaces per leaf node. Tenants B and C, are assigned to Servers 2 and 3 respectively, and each receive their own VRF, with one interface per leaf node.

Figure 39: Server Isolation Tenant Assignments

For GPU Isolation:

When a new tenant is onboarded and assigned one or more GPU, a dedicated IP-VRF routing instance is created for that tenant BUT only on the leaf nodes with physical connections to the GPUs assigned to that tenant, as shown in Figure 40

In the example, Tenant A is assigned GPU 0 on Servers 1 and 2, its VRF is created only on Leaf 1. No other leaf nodes are affected. Tenant B is assigned GPU 6 on Server 1, its VRF is created only on Leaf 6. Tenant C is assigned GPUs 7 and 8 on Servers 2 and 3, its VRF is created on Leaf 7 and Leaf 8

This selective placement of IP-VRFs ensures that only the required leaf nodes participate in each tenant's network, minimizing configuration overhead while maintaining strict isolation at the GPU level.

Figure 40: GPU Isolation Tenant Assignments

A diagram of a server AI-generated content may be incorrect.

Internal Server Separation

Placing interfaces into different VRFs on the switch side is not sufficient for complete isolation. It is also necessary to isolate the GPUs within the servers. Although disabling local optimization or PXN may appear to prevent cross-GPU traffic, in reality it only prevents a GPU from using another GPU within the same server as a proxy to reach a GPU on a different rail in a different server, as described in the Local Optimization section. Additional mechanisms are therefore required to ensure true separation, including Kubernetes implementation, and Isolation using NCCL variables.

Kubernetes-Based Isolation:

Many organizations adopt Kubernetes for GPU multitenancy because of its ability to manage shared resources efficiently while isolating workloads across users or teams. Features such as namespaces, cgroups, and role-based access control (RBAC) provide secure, Tenant-1 ware environments that keep workloads isolated within a shared infrastructure. Kubernetes also integrates with vendor-supported GPU operators from NVIDIA and AMD, streamlining the deployment of drivers, device plugins, and monitoring components. This simplifies administration and enables accurate tracking of GPU usage per tenant.

While Kubernetes provides a robust framework for GPU multitenancy in production environments, it is not always practical or necessary for testing and validation.

Isolation with NCCL variables:

In lab setups or early development stages, multitenancy can be implemented without deploying a full Kubernetes stack by manually controlling resource visibility through environment variables. This lightweight approach allows administrators to isolate GPU and network resources per tenant using variables such as:

  • CUDA_VISIBLE_DEVICES (for NVIDIA servers),
  • ROCR_VISIBLE_DEVICES (for AMD servers),
  • UCX_NET_DEVICES, and
  • NCCL_IB_HCA.

By setting CUDA_VISIBLE_DEVICES (on NVIDIA servers) and ROCR_VISIBLE_DEVICES (on AMD servers) environment variables, administrators can restrict each tenant’s applications to having visibility and access to only their assigned GPUs.

When set, they mask all other GPUs from the application’s perspective, creating the appearance that only the assigned GPU(s) are available, preventing unwanted GPU-to-GPU communication. The exposed GPUs are then re-indexed starting from 0. Thus, for each tenant, the GPUs will be indexed starting at 0, regardless of the actual GPU number (rank).

For example, when running a NCCL test on an NVIDIA server:

  • If a tenant is assigned GPU1, setting:

    export CUDA_VISIBLE_DEVICES=1

    ensures that only GPU1 is visible to the application. Internally, this GPU will appear to the application as cuda:0.

  • Similarly, if a tenant is assigned GPU4, setting:

    export CUDA_VISIBLE_DEVICES=4

    ensures that only GPU4 is visible to the application. The GPU will also appear as cuda:0 to the application.

Understanding the remapping behavior of GPU visibility is essential for administrators managing multitenant environments. Because environment variables like CUDA_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES reindex visible GPUs starting from 0, administrators must track the logical-to-physical GPU mapping to ensure accurate monitoring, troubleshooting, and tenant-level usage accounting.

While CUDA_VISIBLE_DEVICES (for NVIDIA) and ROCR_VISIBLE_DEVICES (for AMD) effectively restrict GPU access within the local server, they do not control which network interface is used for inter-node communication. To maintain strict tenant isolation and avoid traffic leakage, additional environment variables must be set to control NIC selection. These include:

  • UCX_NET_DEVICES
  • NCCL_SOCKET_IFNAME
  • NCCL_IB_HCA

These variables define the network interface(s) to be used by UCX and NCCL, ensuring that traffic remains within the tenant’s routing instance and only uses the correct NICs.

The example shown in Figure 41 illustrates a multitenant configuration on a GPU server labeled H100-01, which contains eight GPUs (GPU0–GPU7) and eight corresponding NICs (NIC0–NIC7).

A Tenant-1 NCCL job is shown running on GPU0, isolated using the environment variable CUDA_VISIBLE_DEVICES= GPU0, ensuring the job only sees and accesses GPU0.

Because GPUs 0 and 1 share NUMA locality with NIC6 and NIC8, GPU0 can use either NIC6 or NIC8 to communicate with GPUs assigned to the same tenant on other servers. Without explicit control, it may select a NIC associated with a different tenant, violating traffic isolation. To prevent this, the job must also be restricted to NIC6 (gpu0_eth) by setting: UCX_NET_DEVICES=gpu0_eth.

Failing to specify the correct NIC can result in communication failures or cross-tenant traffic leakage. In this example, NIC6 is connected to Tenant-1 VRF on the leaf node, while NIC NIC8 is connected to Tenant B’s VRF.

The left side of Figure 41 shows a case where the correct NIC is selected, and therefore the traffic correctly exits on the interface connected to Tenant 1’s routing instance.

The right side shows a case where the incorrect NIC is selected, and the traffic incorrectly exits on the interface connected to Tenant 2’s routing instance.

Figure 41: GPU and NIC Isolation for Tenant-1 NCCL Job

For more details on NCCL and RCCL environment variables refer to the latest NVIDIA and AMD documentation. The latest at the time of this document's publication can be found here: