Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Frontend Fabric

The Frontend Fabric provides the infrastructure for users to interact with the AI systems to orchestrate training and inference tasks workflows using tools such as SLURM. These interactions do not generate heavy data flows nor have rigorous requirements regarding latency or packet drops; thus, they do not impose rigorous demands on the fabric.

The Frontend Fabric design described in this JVD follows a traditional 3-stage IP Fabric architecture without HA, as shown in Figure 4. This architecture provides a simple and effective solution for the connectivity required in the Frontend. However, any fabric architecture including EVPN/VXLAN, could be used. If an HA-capable Frontend Fabric is required we recommend following the 3-Stage with Juniper Apstra JVD.

Figure 4: Frontend Fabric Architecture

The devices included in the Frontend fabric, and the connections between them, are summarized in the following table:

Table 1: Frontend devices

Nvidia DGX GPU Servers Weka Storage Servers Headend Servers

Frontend

Leaf Nodes switch model

frontend-leaf#

(#=1-2)

Frontend Spine Nodes switch model

frontend-spine#

(#=1-2)

A100 x 8

H100 x 4

Weka Storage Server x 8 Headend-SVR x 3 QFX5130-32CD x 2 QFX5130-32CD x 2

Table 2: Connections between servers, leaf and spine nodes per cluster and stripe in the Frontend

GPU Servers to <=>

Frontend Leaf Nodes

Weka Storage Servers <=>

Frontend Leaf Nodes

Headend Servers <=>

Frontend Leaf Nodes

Frontend Spine Nodes <=>

Frontend Leaf Nodes

1 x 100GE links

between each GPU server

A100-0#, H100-01#

(#=1-8 for A100 and 1-4 for H100) and frontend-leaf1

1 x 100GE links

between each storage server weka# (#=1-8) and

frontend-leaf2

1 x 10GE links

between each headend server Headend-SVR-0# (#=1-3) and frontend-leaf2

2 x 400GE links

between each leaf node and each spine node.

This fabric is a pure L3 IP fabric using EBGP for route advertisement. The IP addressing and EBGP configuration details are described in the networking section on this document.