Appendix C – How to Run NCCL Test Using Autoconfigured IPV6 Address
To run a model or NCCL test using a global IPv6 addresses assigned either statically or automatically via SLAAC the value of the NCCL_IB_GID_INDEX variable must be adjusted.
Starting with NCCL 2.21, the GID index no longer needs to be specified manually. It is automatically handled based on the NCCL_SOCKET_FAMILY setting. If NCCL_SOCKET_FAMILY is set to AF_INET6 and IPv6 connectivity between hosts is in place, RoCEv2 traffic over IPv6 should work as expected.
The NCCL_IB_GID_INDEX variable defines the Global ID index used by RoCE (RDMA) communication. The default value is -1, which means that NCCL will automatically select the correct GID index based on the active link layer of the InfiniBand device. If the link layer is Ethernet (RoCE), NCCL will use the GID index that returns a GID with RoCE v2 support (usually GID index 3, depending on driver/firmware).
For more details, you can review Nvidia’s Environment Variables documentation.
To find the GID for the desired address, use the following command:ibv_devinfo -vvv
-d <mellanox-interface-name> | grep GID
To find the mellanox interface name you can use the following script:
jnpr@H100-01:~/scripts$ cat nvidia_map_iface_to_mlx_YL.sh # Script to map network interfaces to Mellanox interfaces echo "Network Interface to Mellanox Interface Mapping:" # Loop through each network interface in /sys/class/net/ for iface in $(ls /sys/class/net/); do if [ -d /sys/class/net/$iface/device/infiniband_verbs ]; then # Find the Mellanox interface by reading the ibdev file mlx_iface=$(cat /sys/class/net/$iface/device/infiniband_verbs/*/ibdev) echo "$iface => $mlx_iface" fi done
Example:
jnpr@H100-01:/etc/netplan$ ibv_devinfo -vvv -d mlx5_6 | grep GID GID[ 0]: fe80:0000:0000:0000:a288:c2ff:fe3b:506a, RoCE v1 GID[ 1]: fe80::a288:c2ff:fe3b:506a, RoCE v2 GID[ 2]: 0000:0000:0000:0000:0000:ffff:0ac8:010a, RoCE v1 GID[ 3]: ::ffff:10.200.1.10, RoCE v2 GID[ 4]: FC00:200:0000:0002:a288:c2ff:fe3b:506a, RoCE v1 GID[ 5]: FC00:200:0:2:a288:c2ff:fe3b:506a, RoCE v2 jnpr@H100-01:~/scripts$ ./nvidia_map_iface_to_mlx_YL.sh | egrep "gpu|Map" Network Interface to Mellanox Interface Mapping: gpu0_eth => mlx5_11 gpu1_eth => mlx5_6 gpu2_eth => mlx5_10 gpu3_eth => mlx5_9 gpu4_eth => mlx5_4 gpu5_eth => mlx5_3 gpu6_eth => mlx5_5 gpu7_eth => mlx5_0 gpu6_eth => mlx5_5 gpu7_eth => mlx5_0 stor0_eth => mlx5_1
Once you have identified the GID you can run a NCCL test as shown in the example:
NCCL_PXN_DISABLE=1 NCCL_IB_QPS_PER_CONNECTION=4 NCCL_IB_GID_INDEX =5 ./nccl_run_rails_all_H100.sh -b 1G -e 1G -n 200 -i 0 -m 10
The following script provides mapping information between the Mellanox interface names, the NIC numbers, and the user assigned interface names (e.g. gpu0_eth). It also provides mapping information between the interfaces and the GPUs.
lara@A100-01:~/SCRIPTS$ cat find_pxb_gpu_nic_pairs.py #!/usr/bin/env python3 import subprocess import pandas as pd import re from collections import defaultdict # Step 1: Run filter_topo_dynamic.py print("Running filter_topo_dynamic.py...") subprocess.run(["python3", "filter_topo_dynamic.py"], check=True) # Step 2: Read filtered_topo.csv df = pd.read_csv("filtered_topo.csv") # Step 3: Identify PXB entries from GPU rows pxb_matches = defaultdict(list) gpu_rows = df[df["Label"].str.startswith("GPU")] for _, row in gpu_rows.iterrows(): gpu = row["Label"] for nic in df.columns[1:]: if str(row[nic]).strip().upper() == "PXB": pxb_matches[gpu].append(nic) # Step 4: Parse gpu_eth-to-nic.txt gpu_eth_to_nic = {} with open("gpu_eth-to-nic.txt") as f: for line in f: match = re.match(r"(gpu\d+_eth)\s+←→\s+(NIC\d+)", line) if match: gpu_eth, nic = match.groups() gpu_eth_to_nic[gpu_eth] = nic # Step 5: Invert map to find which GPU eth corresponds to each NIC nic_to_gpu_eth = {nic: gpu_eth for gpu_eth, nic in gpu_eth_to_nic.items()} # Step 6: Output result print("\nGPU to PXB NICs (with eth):") for gpu in sorted(pxb_matches.keys(), key=lambda x: int(x[3:])): nic_list = sorted(pxb_matches[gpu], key=lambda x: int(x[3:])) formatted = ", ".join([f"{nic} ({nic_to_gpu_eth.get(nic, 'unknown')})" for nic in nic_list]) print(f"{gpu} => {formatted}")
Example:
jnpr@A100-01:~/SCRIPTS$ python3 find_pxb_gpu_nic_pairs.py
Running filter_topo_dynamic.py... [sudo] password for jnpr: Mapping from mlx5_X to gpuX_eth: mlx5_6 → gpu0_eth mlx5_8 → gpu1_eth mlx5_0 → gpu2_eth mlx5_2 → gpu3_eth mlx5_16 → gpu4_eth mlx5_18 → gpu5_eth mlx5_10 → gpu7_eth mlx5_12 → gpu6_eth Saved to mlx_to_gpu_eth.txt NIC Legend: mlx5_0 → NIC0 mlx5_1 → NIC1 mlx5_2 → NIC2 mlx5_3 → NIC3 mlx5_4 → NIC4 mlx5_5 → NIC5 mlx5_6 → NIC6 mlx5_7 → NIC7 mlx5_8 → NIC8 mlx5_9 → NIC9 mlx5_10 → NIC10 mlx5_11 → NIC11 mlx5_12 → NIC12 mlx5_13 → NIC13 mlx5_14 → NIC14 mlx5_15 → NIC15 mlx5_16 → NIC16 mlx5_17 → NIC17 mlx5_18 → NIC18 mlx5_19 → NIC19 Matched NICs and their GPUs: gpu0_eth ←→ NIC6 gpu1_eth ←→ NIC8 gpu2_eth ←→ NIC0 gpu3_eth ←→ NIC2 gpu4_eth ←→ NIC16 gpu5_eth ←→ NIC18 gpu7_eth ←→ NIC10 gpu6_eth ←→ NIC12 Saved to gpu_eth-to-nic.txt Done! Filtered output saved to filtered_topo.csv GPU to PXB NICs (with eth): GPU0 => NIC6 (gpu0_eth), NIC8 (gpu1_eth) GPU1 => NIC6 (gpu0_eth), NIC8 (gpu1_eth) GPU2 => NIC0 (gpu2_eth), NIC2 (gpu3_eth) GPU3 => NIC0 (gpu2_eth), NIC2 (gpu3_eth) GPU4 => NIC16 (gpu4_eth), NIC18 (gpu5_eth) GPU5 => NIC16 (gpu4_eth), NIC18 (gpu5_eth) GPU6 => NIC10 (gpu7_eth), NIC12 (gpu6_eth) GPU7 => NIC10 (gpu7_eth), NIC12 (gpu6_eth)