Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

JVD Validation Goals and Scope

Tests Objectives

The primary objectives of the JVD testing can be summarized as:

  • Qualification of the complete AI fabric design functionality including the Frontend, GPU Backend, and Storage Backend fabrics, and connectivity between AMD GPUs and Vast Storage.
  • Qualification of the deployment steps based on Juniper Apstra.
  • Ensure the design is well-documented and will produce a reliable, predictable deployment for the customer.

The qualification objectives included validating:

  • Validation of blueprint deployment, device upgrade, incremental configuration pushes/provisioning, Telemetry/Analytics checking, failure mode analysis, congestion avoidance and mitigation, and verification of host, storage, and GPU traffic.

Tests Scope

The AI JVD testing for the described network included the following:

  • Design and blueprint deployment through Apstra of three distinct fabrics
  • Fabric operation and monitoring through Apstra analytics and telemetry dashboard
  • Congestion management with PFC and ECN, including failure scenarios
  • End-to-end traffic flow, with Dynamic Load Balancing (DLB)
  • System health, ARP, ND, MAC, BGP (route, next hop), interface traffic counters, and so on
  • Software operation verification (no anomalies, or issues found)
  • AI fabric with Juniper Apstra successfully performing under the following required scenarios (must):
    • Node failure (reboot)
    • Interface failures (interface down/up, Laser on/off):

Under these scenarios the following were evaluated/validated:

  • Completion of AI Job models within MLCommons Training benchmarks
  • Traffic recovery was validated after all failure scenarios.
  • impact to the fabric and check anomalies reporting in Apstra.

Other features tested

  • Broadcom 97608 THOR2 NICs
  • Mellanox Connect-X NIC card default settings.
  • DSCP and CNP configuration on the NICs
  • Connectivity between fabric-connected hosts created by Apstra towards NSX-managed hosts.
  • BERT/LLAMA3 test completion times
  • Llama2 Inference against existing infrastructure.

Refer to the test report for more information.

Tested Optics

Table 37: Frontend Fabric Optics

Frontend Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-085351 QSFP56-DD-400GBASE-DR4 spine QFX5130-32CD QSFP-DD
740-085351 QSFP56-DD-400GBASE-DR4 leaf QFX5130-32CD QSFP-DD
740-061405 QSFP-100GBASE-SR4-T2 leaf QFX5130-32CD QSFP28
740-046565 QSFP+-40G-SR4 w/ 4x10G breakout cable. leaf QFX5130-32CD QSFP+
AFBR-709SMZ AVAGO 10GBASE-SR SFP+ 300m Server SuperMicro Headend Server Intel X710
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m

GPU

Server

AMD MI300Xx Dell XE96880 BCM97608 THOR2
AFBR-89CDDZ AVAGO 100GbE QSFP28 300m

GPU

Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

ConnectX-7

Table 38: Backend Storage Fabric Optics

Backend Storage Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-085351 QSFP56-DD-400GBASE-DR4 spine QFX5220-32CD QSFP-DD
740-085351 QSFP56-DD-400GBASE-DR4 leaf QFX5220-32CD QSFP-DD
740-058734 QSFP-100GBASE-SR4 leaf QFX5220-32CD QSFP28
720-128730 QSFP56-DD-2x200GBASE-CR4-CU-2.5M w/ 400G DAC Breakout into 2X200G leaf QFX5220-32CD QSFP-DD
740-061405 QSFP-100GBASE-SR4 leaf QFX5220-32CD QSFP28
740-159002 QSFP56-DD-2x200G-BOAOC-5M GPU Server AMD MI300Xx Dell XE9680 BCM97608 THOR2
740-159002 QSFP56-DD-2x200G-BOAOC-5M GPU Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

ConnectX-7
740-061405 QSFP-100GBASE-SR4 Storage Vast Storage CBOX ConnectX-6
740-061405 QSFP-100GBASE-SR4 Storage Vast Storage DBOX ConnectX-6

Table 39: Backend GPU Fabric

Backend GPU Fabric
Part number Optics Name Device Role Device Model Interface/NIC type
740-174933 OSFP-800G-DR8 spine QFX5240-64OD OSPF800
740-174933 OSFP-800G-DR8 leaf QFX5240-64OD OSPF800
740-085351 QDD-400G-DR4 GPU Server AMD MI300Xx Dell XE9680 BCM97608 THOR2
740-085351 QDD-400G-DR4 GPU Server

AMD MI300Xx SuperMicro

AS-8125GS-TNMR2

BCM97608 THOR2
Note:

For optics tested on QFX5220-64CD, QFX5230-64CD, PTX10008, WEKA storage and NVIDIA GPUs servers check AI Data Center Network with Juniper Apstra, NVIDIA GPUs, and WEKA Storage—Juniper Validated Design (JVD) Tested Optics Section.