About this Document

This document describes the design requirements and implementation of an AI cluster infrastructure that includes support for GPU multitenancy in the GPU backend fabric, using EVPN/VXLAN. This fabric is built based on AI-optimized Juniper Data Center QFX5240 series switches. The cluster includes Nvidia H100 DGX as well as AMD MI300X GPU servers, and Vast Storage systems.

All validation tests were conducted in Juniper’s AI Innovation Lab in Sunnyvale, CA, USA. In this open lab, Juniper collaborates closely with customers and technology partners to develop AI solutions and test deployments for a range of AI applications and models.

The AI Innovation Lab allows customers to see AI training and inference in action. Juniper performs these tests running both customer-specific models as well as those from MLCommons for MLPerf performance benchmarking and comparisons.