JVD Validation Goals and Scope
Tests Objectives
The primary objectives of the JVD testing can be summarized as:
- Qualification of the complete AI fabric design functionality including the Frontend, GPU Backend, and Storage Backend fabrics, and connectivity between AMD GPUs and Vast Storage.
- Ensure the design is well-documented and will produce a reliable, predictable deployment for the customer.
The qualification objectives included validating:
- Validation of blueprint deployment, device upgrade, incremental configuration pushes/provisioning, Telemetry/Analytics checking, failure mode analysis, congestion avoidance and mitigation, and verification of host, storage, and GPU traffic.
Tests Scope
The AI JVD testing for the described network included the following:
- Congestion management with PFC and ECN, including failure scenarios
- End-to-end traffic flow, with Dynamic Load Balancing (DLB)
- System health, ARP, ND, MAC, BGP (route, next-hop), interface traffic counters, and so on
- Software operation verification (no anomalies, or issues found)
-
IPv6 Stateless Address Auto-configuration (SLAAC)
-
Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop (RFC5549)
Under these scenarios the following were evaluated/validated:
- Completion of AI Job models within MLCommons Training benchmarks
- Traffic recovery was validated after all failure scenarios.
Other Features Tested
- Broadcom 97608 THOR2 NICs
- Mellanox Connect-X NIC card default settings.
- DSCP and CNP configuration on the NICs
- BERT/LLAMA3 test completion times
- Llama2 Inference against existing infrastructure.
- Refer to the test report for more information.
Features Not Included
- IPv4 DHCP/DHCP relay for tenants – Will be included in future version of this JVD
- IPv6 DHCP/DHCP relay for tenants – Will be included in future version of this JVD
- MAC VRF, IRB, Type 2 EVPN, ERB – TBD
- Multihomed – TBD
- Global Load Balancing (GLB) – Will be included in UEF JVD
- Storage Multitenancy – TBD
- Inference/Frontend Multitenancy – Will be included in future JVD
- IPv6 underlay/overlay deployment using Apstra – Will be included in future version of this JVD
Tested Optics
Table 54: Frontend Fabric Optics
Frontend Fabric | ||||
---|---|---|---|---|
Part number | Optics Name | Device Role | Device Model | Interface/NIC type |
740-085351 | QSFP56-DD-400GBASE-DR4 | spine | QFX5130-32CD | QSFP-DD |
740-085351 | QSFP56-DD-400GBASE-DR4 | leaf | QFX5130-32CD | QSFP-DD |
740-061405 | QSFP-100GBASE-SR4-T2 | leaf | QFX5130-32CD | QSFP28 |
740-046565 | QSFP+-40G-SR4 w/ 4x10G breakout cable. | leaf | QFX5130-32CD | QSFP+ |
AFBR-709SMZ | AVAGO 10GBASE-SR SFP+ 300m | Server | SuperMicro Headend Server | Intel X710 |
AFBR-89CDDZ | AVAGO 100GbE QSFP28 300m |
GPU Server |
AMD MI300Xx Dell XE96880 | BCM97608 THOR2 |
AFBR-89CDDZ | AVAGO 100GbE QSFP28 300m |
GPU Server |
AMD MI300Xx SuperMicro AS-8125GS-TNMR2 |
ConnectX-7 |
Table 55: Backend Storage Fabric Optics
Backend Storage Fabric | ||||
---|---|---|---|---|
Part number | Optics Name | Device Role | Device Model | Interface/NIC type |
740-085351 | QSFP56-DD-400GBASE-DR4 | spine | QFX5220-32CD | QSFP-DD |
740-085351 | QSFP56-DD-400GBASE-DR4 | leaf | QFX5220-32CD | QSFP-DD |
740-058734 | QSFP-100GBASE-SR4 | leaf | QFX5220-32CD | QSFP28 |
720-128730 | QSFP56-DD-2x200GBASE-CR4-CU-2.5M w/ 400G DAC Breakout into 2X200G | leaf | QFX5220-32CD | QSFP-DD |
740-061405 | QSFP-100GBASE-SR4 | leaf | QFX5220-32CD | QSFP28 |
740-159002 | QSFP56-DD-2x200G-BOAOC-5M | GPU Server | AMD MI300Xx Dell XE9680 | BCM97608 THOR2 |
740-159002 | QSFP56-DD-2x200G-BOAOC-5M | GPU Server |
AMD MI300Xx SuperMicro AS-8125GS-TNMR2 |
ConnectX-7 |
740-061405 | QSFP-100GBASE-SR4 | Storage | Vast Storage CBOX | ConnectX-6 |
740-061405 | QSFP-100GBASE-SR4 | Storage | Vast Storage DBOX | ConnectX-6 |
Table 56: Backend GPU Fabric Optics
Backend GPU Fabric | ||||
---|---|---|---|---|
Part number | Optics Name | Device Role | Device Model | Interface/NIC type |
740-174933 | OSFP-800G-DR8 | spine | QFX5240-64OD | OSPF800 |
740-174933 | OSFP-800G-DR8 | leaf | QFX5240-64OD | OSPF800 |
740-085351 | QDD-400G-DR4 | GPU Server | AMD MI300Xx Dell XE9680 | BCM97608 THOR2 |
740-085351 | QDD-400G-DR4 | GPU Server |
AMD MI300Xx SuperMicro AS-8125GS-TNMR2 |
BCM97608 THOR2 |
For optics tested on QFX5220-64CD, QFX5230-64CD, PTX10008, WEKA storage and NVIDIA GPUs servers check AI Data Center Network with Juniper Apstra, NVIDIA GPUs, and WEKA Storage—Juniper Validated Design (JVD) Tested Optics Section.