Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

System Agents

Device system agents handle configuration management, device-to-server communication, and telemetry collection. If you're not using ZTP to bootstrap your devices (or if you have a one-off installation or are using a compute agent) you can use this device installer to automatically install and verify devices. Depending on the device NOS (or OS if it's a compute agent), you can install device agents on-box (agent is installed on the device) or off-box (agent is installed on the server and communicates with devices via API). To find out which platforms support onbox and offbox agents, see the Device Management section of the 6.0.0 Feature Matrix.

For more information about managing devices, see Managed Devices.

Agents include the following parameters:

Table 1: Device System Agent Parameters
Parameter Description
Device addresses Management IP(s) of the device(s)
Platform (off-box only) For off-box agents only: drop-down list includes supported platforms.
Username / Password If you're not using an agent profile with credentials, check these boxes and add credentials.
Agent Profile If you don't want to manually enter credentials and packages, use agent profiles that you previously defined.
Job to run after creation
  • Install (default) - installs the agent on the device
  • Check - creates the agent, but does not install it. It appears in the list view where you can install it later.
Install Requirements (servers only) For servers only: If servers don't have Internet connectivity, uncheck the box.
Packages Before creating the agent, install required packages so they are available. Packages associated with selected agent profiles are listed here as well.
Open Options (off-box only) Passes configured parameters to off-box agents. For example, to use HTTPS as the API connection from off-box agents to devices, use the key-value pair: proto-https - port-443. The following default values can be overridden with open options:
  • commit_timeout - 60 (integer: seconds)
  • telemetry_timeout - 100 (integer: seconds)
  • probe_timeout: 5 (integer: seconds)
  • log_config_diff - True (boolean)

System Agents in the GUI

From the left navigation menu, navigate to Devices > Managed Devices to go to the managed devices table.

To perform a task related to agents on one or more devices, select their check boxes (first column in table). The Agent menu appears above the table with the available tasks (check, install, uninstall, upgrade OS image, assign profile, delete) for the selected agents.

To perform a task related to an agent on a single device, click the Actions button for the device. The Agent menu appears vertically with the available tasks (check, install, uninstall, OS upgrade, revert, collect pristine config, show log, cancel active job, edit, delete).

See next pages for details on performing tasks related to agents.

NVIDIA Compute Agents

CAUTION:

Your compute agent installation will fail unless you make sure you have sudo privileges and passwordless access to your GPU server.

Note: ZTP is not supported on compute agents. Also, a pristine configuration is not applicable on compute agents.

Compute agents are on-box agents and are loaded onto NVIDIA DGX A100 and DGX H100 GPU servers. These GPU servers come installed with Ubuntu 22.0.4.

Each GPU server has eight GPUs, and each GPU is assigned to a ConnectX-6/7 RDMA interface. Each interface is a member of a rail, and the interface is part of a rail group. You can assign the interface to a rail group according to this numbering scheme. For example, if you have an interface with an index of 0, you can add that interface to a group called Rail 1. If the index is 1, you can add that interface to a group called Rail 2, and so on, until you reach Rail 8. By default, every rail is connected to a different leaf device, but you can design optimized templates with more than one rail per leaf.

Installing the compute agent is a manual process, so Zero Touch Provisioning isn't supported. Also, a pristine configuration is not supported. Creating a compute agents is still relatively simple. and the process is very similar to creating a system agent on a switch. However, a compute agent is for telemetry only, and no configuration is pushed onto the the GPU server. GPU server logical devices, device profiles and interface maps are included as part of the product. You don’t need to create them, although you can modify them as needed.

NVIDIA Compute Agents

You can use the compute agent to monitor how many CNP packets are received and how many GPU Out of Sequence (OOS) packets are received. The Analytics Engine then consumes this data, and you can use this real-time data to provide insights into network performance, traffic patterns, potential congestion points, and impacted endpoints. This information will help you identify performance bottlenecks and anomalies.

You can also create customizable dashboards to provide real-time and historical insights into the data you’re collecting. This functionality helps you make informed, data-driven decisions.

As part of the compute agent, two new telemetry services are available: GPU_Hardware_Counters and Gpu_Infiniband_Dev_To_Interface. For more information, see GPU Hardware Counters.

Also, the following services have been expanded to run on the compute agents:

  • LLDP

    Hostname

    Interface

    Interface_Counters

    Resource_Util

    Disk_Util

See Create Onbox Agent

to start onboarding the NVIDIA compute agent.

Verification