Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Telemetry and Monitoring

AI cluster networks demand lossless, high-throughput, and low-latency connectivity. A key component of maintaining performance is the collection and analysis of operational data to monitor congestion, system health, and traffic patterns. Junos OS telemetry enables detailed tracking of critical performance indicators, including thresholds, counters, and congestion metrics specific to AI workloads. Once collected, this data must be analyzed, structured, and visualized to support monitoring, decision-making, and continuous network optimization.

The following sections describe how to configure the devices to enable data collection and outline key performance metrics recommended for the AI EVPN/VXLAN fabric solution.

Configuring QFX Switches to Provide Telemetry Information

To implement telemetry collection the switches need to be configure to allow gPRC-based access as described in the OpenConfig and gRPC for Junos Telemetry Interface section of Junos Telemetry Interface User Guide.

The following configuration was used on all the leaf and spine node devices for this purpose:

Table 49. gRPC Configuration Commands for Junos OS

Command Description
extension-service request-response grpc Enables the gRPC interface under the extension service framework, used for APIs like Junos Telemetry Interface (JTI) or third-party integrations. The client issues a request and waits for a response from the Junos OS server.
ssl port 32767 Configures TCP port 32767 for communication using SSL encryption.
local-certificate aos_grpc Configures authentication using a certificate named aos_grpc to secure the gRPC session. Follow the steps described in Configure gRPC Services to generate and install the necessary certificates.
routing-instance mgmt_junos Binds the gRPC server to the mgmt_junos routing-instance, meaning it only listens on the out-of-band management interface.

To validate connectivity between the telemetry collector, use the show system connections command and search for the ssl port number configured.

The sample output shows connections from two collectors (10.100.1.17 and 10.100.1.20).

To confirm that the collectors are actively pulling data via gRPC/gNMI and see what sensors are in use, use:

  • show network-agent statistics
  • show network-agent statistics detail
  • show network-agent statistics subscription-paths <sensor-path>
  • show network-agent statistics juniper
  • show network-agent statistics gnmi

Example:

To confirm the status of sensors, you can use: show agents sensors