System Agents
Device system agents handle configuration management, device-to-server communication, and telemetry collection. If you're not using ZTP to bootstrap your devices (or if you have a one-off installation or are using a compute agent) you can use this device installer to automatically install and verify devices. Depending on the device NOS (or OS if it's a compute agent), you can install device agents on-box (agent is installed on the device) or off-box (agent is installed on the server and communicates with devices via API). To find out which platforms support onbox and offbox agents, see the Device Management section of the 6.0.0 Feature Matrix.
For more information about managing devices, see Managed Devices.
Agents include the following parameters:
Parameter | Description |
---|---|
Device addresses | Management IP(s) of the device(s) |
Platform (off-box only) | For off-box agents only: drop-down list includes supported platforms. |
Username / Password | If you're not using an agent profile with credentials, check these boxes and add credentials. |
Agent Profile | If you don't want to manually enter credentials and packages, use agent profiles that you previously defined. |
Job to run after creation |
|
Install Requirements (servers only) | For servers only: If servers don't have Internet connectivity, uncheck the box. |
Packages | Before creating the agent, install required packages so they are available. Packages associated with selected agent profiles are listed here as well. |
Open Options (off-box only) | Passes configured parameters to off-box agents. For example, to use
HTTPS as the API connection from off-box agents to devices, use the
key-value pair: proto-https - port-443. The following default values can
be overridden with open options:
|
System Agents in the GUI
From the left navigation menu, navigate to Devices > Managed Devices to go to the managed devices table.
To perform a task related to agents on one or more devices, select their check boxes (first column in table). The Agent menu appears above the table with the available tasks (check, install, uninstall, upgrade OS image, assign profile, delete) for the selected agents.
To perform a task related to an agent on a single device, click the Actions button for the device. The Agent menu appears vertically with the available tasks (check, install, uninstall, OS upgrade, revert, collect pristine config, show log, cancel active job, edit, delete).
See next pages for details on performing tasks related to agents.
NVIDIA Compute Agents
Your compute agent installation will fail unless you make sure you have sudo privileges and passwordless access to your GPU server.
Compute agents are on-box agents and are loaded onto NVIDIA DGX A100 and DGX H100 GPU servers. These GPU servers come installed with Ubuntu 22.0.4.
Each GPU server has eight GPUs, and each GPU is assigned to a ConnectX-6/7 RDMA interface. Each interface is a member of a rail, and the interface is part of a rail group. You can assign the interface to a rail group according to this numbering scheme. For example, if you have an interface with an index of 0, you can add that interface to a group called Rail 1. If the index is 1, you can add that interface to a group called Rail 2, and so on, until you reach Rail 8. By default, every rail is connected to a different leaf device, but you can design optimized templates with more than one rail per leaf.
Installing the compute agent is a manual process, so Zero Touch Provisioning isn't supported. Also, a pristine configuration is not supported. Creating a compute agents is still relatively simple. and the process is very similar to creating a system agent on a switch. However, a compute agent is for telemetry only, and no configuration is pushed onto the the GPU server. GPU server logical devices, device profiles and interface maps are included as part of the product. You don’t need to create them, although you can modify them as needed.

You can use the compute agent to monitor how many CNP packets are received and how many GPU Out of Sequence (OOS) packets are received. The Analytics Engine then consumes this data, and you can use this real-time data to provide insights into network performance, traffic patterns, potential congestion points, and impacted endpoints. This information will help you identify performance bottlenecks and anomalies.
You can also create customizable dashboards to provide real-time and historical insights into the data you’re collecting. This functionality helps you make informed, data-driven decisions.
As part of the compute agent, two new telemetry services are available: GPU_Hardware_Counters and Gpu_Infiniband_Dev_To_Interface. For more information, see GPU Hardware Counters.
Also, the following services have been expanded to run on the compute agents:
-
LLDP
Hostname
Interface
Interface_Counters
Resource_Util
Disk_Util
Verification
-
How do I verify that the compute agent was installed properly from the server and Apstra side?
You can check the job in the GUI, or run the sudo service aos status command in the server.
-
How do I verify that the agent is running correctly and is connected to the Apstra Controller?
If you've added the agent to a blueprint, Apstra will monitor and raise a liveness anomaly if there is problem.
-
How do I verify that things are running correctly after I reboot the Apstra server?
Run the service aos status or systemctl status aos command.
-
What do I do if the GPU server has been redeployed to another environment, and the agent wasn't removed before they unplugged the GPU server?
Run the dpkg --purge --force-all aos-device-agent command
followed by the rm -fr /etc/aos /var/log/aos command.
-
How is the agent upgraded if either a) Apstra is upgraded or b) Apstra is upgraded out of sequence or c) there's an issue with the agent?
How do you verify that the agent is connected to the Apstra Controller? What ports does the agent use to communicate with the Apstra Controller and what protocols do the ports use?
https://www.juniper.net/documentation/us/en/software/apstra5.1/apstra-install-upgrade/topics/topic-map/apstra-server-upgrade-diff.html#installation-guide_upgrading_on-box_agents_vm-vm-
How do I restart the agent on the GPU server if there was an issue with agent?
Run the sudo service aos restart command.