ON THIS PAGE
JVD Validation Goals and Scope
Tests Objectives
The primary objectives of the JVD testing can be summarized as:
- Qualification of the complete AI fabric design functionality including the Frontend, GPU Backend, and Storage Backend fabrics, and connectivity between AMD GPUs and Vast Storage.
- Qualification of the deployment steps based on Juniper Apstra.
- Ensure the design is well-documented and will produce a reliable, predictable deployment for the customer.
- Validation of Thor2 5m DAC operation with Juniper’s TH5 chip.
- Validation of QFX and PTX platforms in different roles in the fabric.
The qualification objectives included validating:
- Validation of blueprint deployment, device upgrade, incremental configuration pushes/provisioning, Telemetry/Analytics checking, failure mode analysis, congestion avoidance and mitigation, and verification of host, storage, and GPU traffic.
Tests Scope
The AI JVD testing for the described network included the following:
- Design and blueprint deployment through Apstra of three distinct fabrics
- Fabric operation and monitoring through Apstra analytics and telemetry dashboard
- Congestion management with PFC and ECN, including failure scenarios
- End-to-end traffic flow, with Dynamic Load Balancing (DLB)
- System health, ARP, ND, MAC, BGP (route, next hop), interface traffic counters, and so on
- Software operation verification (no anomalies, or issues found)
- Validation of Thor2 5m DAC operation with Juniper’s TH5 chip.
- AI fabric with Juniper Apstra successfully performing under the
following required scenarios (must):
- Node failure (reboot)
- Interface failures (interface down/up, Laser on/off)
Under these scenarios the following were evaluated/validated:
- Completion of AI Job models within MLCommons Training benchmarks
- Traffic recovery was validated after all failure scenarios.
- Impact the fabric and check anomalies reporting in Apstra.
Other features tested
- BCM957608 THOR2 NICs
- Pollara NICs
- Mellanox Connect-X NIC card default settings.
- DSCP and CNP configuration on the NICs
- Connectivity between fabric-connected hosts created by Apstra towards NSX-managed hosts.
- BERT/LLAMA3 test completion times
- Llama2 Inference against existing infrastructure.
Refer to the test report for more information.