Storage Backend Fabric

The Storage Backend fabric provides the connectivity infrastructure for storage devices to be accessible from the GPU servers.

The performance of the storage infrastructure significantly impacts the efficiency of AI workflows. A storage system that provides quick access to data can significantly reduce the amount of time for training AI models. Similarly, a storage system that supports efficient data querying and indexing can minimize the completion time of preprocessing and feature extraction in an AI workflow.

The Storage Backend fabric design in the JVD also follows a 3-stage IP clos architecture as shown in Figure 16. There is no concept of rail-optimization in a storage cluster. Each GPU server has a single connection to the leaf nodes, instead of 8.

Figure 16: Storage Backend Fabric Architecture

The Storage Backend devices included in this fabric, and the connections between them, are summarized in the following table:

Table 16: Storage Backend devices

Nvidia DGX GPU Servers

Weka Storage Servers

Storage Backend Leaf Nodes switch model

(storage-backend-gpu-leaf & storage-backend-weka-leaf)

Storage Backend Spine Nodes switch model

(storage-backend-spine#)

A100 x 8

H100 x 4

Weka storage server x 8

QFX5130-32CD x 4

(2 storage-backend-gpu-leaf nodes, and

2 storage-backend-weka-leaf nodes)

QFX5130-32CD x 2

QFX5230 and QFX5240 were also validated for the Storage Backend Leaf and Spine roles.

Table 17: Connections between servers, leaf and spine nodes in the Storage Backend

GPU Servers <=>

Storage Backend GPU Leaf Nodes

Weka Storage Servers <=>

Storage Backend Weka Leaf Nodes

Storage Backend Spine Nodes <=> Storage Backend Leaf nodes

1 x 100GE links

between each H100 server and the storage-backend-gpu-leaf switch

1 x 200GE links

between each A100 server and the storage-backend-gpu-leaf switch

1 x 100GE links

between each storage server (weka-1 to weka-8) and the storage-backend-weka-leaf switch

2 x 400GE links

between each leaf and spine nodes and the storage-backend-weka-leaf switch

3 x 400GE links

between each leaf and spine nodes and the storage-backend-gpu-leaf switch

The NVIDIA servers hosting the GPUs have dedicated storage network adapters (NVIDIA ConnectX) that support both the Ethernet and InfiniBand protocols and provide connectivity to external storage arrays.

Communications between GPUs and the storage devices leverage the WEKA distributed POSIX client which enables multiple data paths for transfer of stored data from the WEKA nodes to the GPU client servers. The WEKA client leverages the Data Plane Development Kit (DPDK) to offload TCP packet processing from the Operating System Kernel to achieve higher throughput.

This communication is supported by the Storage Backend fabric described in the previous section and exemplified in Figure 17.

Figure 17: GPU Backend to Storage Backend Communication

WEKA Storage Solution

In small clusters, it may be sufficient to use the local storage on each GPU server, or to aggregate this storage together using open-source or commercial software. In larger clusters with heavier workloads, an external dedicated storage system is required to provide dataset staging for ingest, and for cluster checkpointing during training. This JVD describes the infrastructure for dedicated storage using WEKA storage.

WEKA is a distributed data platform that allows high performance and concurrent access and allows all GPU Servers in the cluster to efficiently utilize a shared storage resource. With extreme I/O capabilities, the WEKA system can service the needs of all servers and scale to support hundreds or even thousands of GPUs.

Toward the end of this document, you can find more details on the WEKA storage system, including configuration settings, driver details, and more.