Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

JVD Validation Goals and Scope

Tests Objectives

The primary objectives of the JVD testing can be summarized as:

  • Qualification of the complete AI fabric design functionality including the Frontend, GPU Backend, and Storage Backend fabrics, and connectivity between AMD GPUs and Vast Storage.
  • Qualification of the deployment steps based on Juniper Apstra.
  • Ensure the design is well-documented and will produce a reliable, predictable deployment for the customer.
  • Validation of Thor2 5m DAC operation with Juniper’s TH5 chip.
  • Validation of QFX and PTX platforms in different roles in the fabric.

The qualification objectives included validating:

  • Validation of blueprint deployment, device upgrade, incremental configuration pushes/provisioning, Telemetry/Analytics checking, failure mode analysis, congestion avoidance and mitigation, and verification of host, storage, and GPU traffic.

Tests Scope

The AI JVD testing for the described network included the following:

  • Design and blueprint deployment through Apstra of three distinct fabrics
  • Fabric operation and monitoring through Apstra analytics and telemetry dashboard
  • Congestion management with PFC and ECN, including failure scenarios
  • End-to-end traffic flow, with Dynamic Load Balancing (DLB)
  • System health, ARP, ND, MAC, BGP (route, next hop), interface traffic counters, and so on
  • Software operation verification (no anomalies, or issues found)
  • Validation of Thor2 5m DAC operation with Juniper’s TH5 chip.
  • AI fabric with Juniper Apstra successfully performing under the following required scenarios (must):
    • Node failure (reboot)
    • Interface failures (interface down/up, Laser on/off)

Under these scenarios the following were evaluated/validated:

  • Completion of AI Job models within MLCommons Training benchmarks
  • Traffic recovery was validated after all failure scenarios.
  • Impact the fabric and check anomalies reporting in Apstra.

Other features tested

  • BCM957608 THOR2 NICs
  • Pollara NICs
  • Mellanox Connect-X NIC card default settings.
  • DSCP and CNP configuration on the NICs
  • Connectivity between fabric-connected hosts created by Apstra towards NSX-managed hosts.
  • BERT/LLAMA3 test completion times
  • Llama2 Inference against existing infrastructure.

Refer to the test report for more information.