Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

JVD Validation Goals and Scope

Tests Objectives

The primary objectives of the JVD testing can be summarized as:

  • Qualification of the complete AI fabric design functionality including the Frontend, GPU Backend, and Storage Backend fabrics, and connectivity between AMD GPUs and Vast Storage.
  • Ensure the design is well-documented and will produce a reliable, predictable deployment for the customer.

The qualification objectives included validating:

  • Validation of blueprint deployment, device upgrade, incremental configuration pushes/provisioning, Telemetry/Analytics checking, failure mode analysis, congestion avoidance and mitigation, and verification of host, storage, and GPU traffic.

Tests Scope

The AI JVD testing for the described network included the following:

  • BGP-DPF
  • IPv6 SLAAC
  • IPv6 BGP unnumbered BGP auto-discovery
  • Congestion management with PFC and ECN, including failure scenarios
  • End-to-end traffic flow with pinned paths
  • Dynamic Load Balancing (DLB)
  • System health, ARP, ND, MAC, BGP (route, next hop), interface traffic counters, and so on
  • Software operation verification (no anomalies, or issues found)

Under these scenarios the following were evaluated:

  • Completion of AI Job models within MLCommons Training benchmarks
  • Traffic recovery was validated after all failure scenarios.

Other Tested Features

  • Broadcom 97608 THOR2 NICs
  • Mellanox Connect-X NIC card default settings.
  • DSCP and CNP configuration on the NICs
  • BERT/LLAMA3 test completion times
  • Llama2 Inference against existing infrastructure.
  • Refer to the test report for more information.

Tested Optics

Includes optics tested in previous JVDs.

Table 1: Frontend Fabric Optics
Frontend Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-085351 QSFP56-DD-400GBASE-DR4 spine QFX5130-32CD QSFP-DD
740-085351 QSFP56-DD-400GBASE-DR4 leaf QFX5130-32CD QSFP-DD
740-061405 QSFP-100GBASE-SR4-T2 leaf QFX5130-32CD QSFP28
740-046565 QSFP+-40G-SR4 w/ 4x10G breakout cable leaf QFX5130-32CD QSFP+
AFBR-709SMZ AVAGO 10GBASE-SR SFP+ 300m GPU Server SuperMicro Headend Server Intel X710
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m GPU Server AMD MI300Xx Dell XE96880 BCM97608 THOR2
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m GPU Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

ConnectX-7
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m Server SuperMicro A100 HGX Server ConnectX-6 Dx
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m Server NVIDIA H100 DGX Server ConnectX-7
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m Server Weka Storage Server ConnectX-6 Dx
Table 2: Backend Storage Fabric Optics
Backend Storage Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-085351 QSFP56-DD-400GBASE-DR4 spine QFX5220-32CD QSFP-DD
740-085351 QSFP56-DD-400GBASE-DR4 leaf QFX5220-32CD QSFP-DD
740-058734 QSFP-100GBASE-SR4 leaf QFX5220-32CD QSFP28
720-128730 QSFP56-DD-2x200GBASE-CR4-CU-2.5M w/ 400G DAC Breakout into 2X200G leaf QFX5220-32CD QSFP-DD
740-061405 QSFP-100GBASE-SR4 leaf QFX5220-32CD QSFP28
740-159002 QSFP56-DD-2x200G-BOAOC-5M GPU Server AMD MI300Xx Dell XE9680 BCM97608 THOR2
740-159002 QSFP56-DD-2x200G-BOAOC-5M GPU Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

ConnectX-7
720-128730 QSFP56-DD-2x200GBASE-CR4-CU-2.5M Server SuperMicro A100 HGX Server ConnectX-6
740-159003 QSFP56-DD-2x200G-AOCBO-7M Server NVIDIA H100 DGX Server ConnectX-7
740-061405 QSFP-100GBASE-SR4 Storage Vast Storage CBOX ConnectX-6
740-061405 QSFP-100GBASE-SR4 Storage Vast Storage DBOX ConnectX-6
720-128730 QSFP56-DD-2x200GBASE-CR4-CU-2.5M 720-128730 QSFP56-DD-2x200GBASE-CR4-CU-2.5M 720-128730
Table 3: Backend GPU Fabric Optics
Backend GPU Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-174933 OSFP-800G-DR8 spine QFX5240-64OD OSPF800
740-174933 OSFP-800G-DR8 leaf QFX5240-64OD OSPF800
740-085351 QDD-400G-DR4 GPU Server AMD MI300Xx Dell XE9680 BCM97608 THOR2
740-085351 QDD-400G-DR4 GPU Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

BCM97608 THOR2
MMS4X00-NS-FLT NVIDIA 800Gbps Twin-port OSFP 2x400Gb_s Single Mode 2xDR4 100m Server NVIDIA H100 DGX Server ConnectX-7

Note:

For optics tested on QFX5220-64CD, QFX5230-64CD, PTX10008, WEKA storage and NVIDIA GPUs servers check the Tested Optics Section in the AI Data Center Network with Juniper Apstra, NVIDIA GPUs, and WEKA Storage—Juniper Validated Design (JVD) .