Cisco NX-OS Device Agent

This chapter describes the process of manual Agent Installation on Cisco NXOS devices. For the recommended method using the UI, refer to Device Agents

Important

Only in rare exceptions is it needed to follow the manual process of Agent installation. In almost all cases agents should be installed by creating System Agents in the UI. Installing Agents manually is more bespoke, more effort and prone to user error.

In-depth understanding of the various device states, configuration stages and Agent operation is required when attempting manual Agent installation.

When in doubt, contact Apstra Global Support.

Quick start

Manual installation of the Agent involves the following steps:

  • Modifying the guestshell disk size, memory and cpu, as well as restarting the guestshell in order to take effect.
  • Copying the device agent from the AOS Server and installing it.
  • Modifying the aos config file.

Warning

The Cisco GuestShell is not partitioned to be unique with AOS. If there are other applications hosting on the guestshell, any changes in the guestshell could impact them.

Warning

Commands in the “Bootstrap” or “Pristine” configuration may interfere with configuration added by AOS during fabric deployment.

Adding NX-OS configuration “system jumbomtu” with a value lower than MTUs used by AOS will cause AOS MTU commands to fail.

Device configuration requirements

Configuration steps must happen in order on NX-OS - VRF, NXAPI, GuestShell, Create Management VRF. Apstra’s AOS Device agent requires the use of VRF of the name management to allow for agent-server communication. Ensure these lines appear in the running configuration.

!
no password strength-check
username admin password admin-password role network-admin
copp profile strict
!
vrf context management
  ip route 0.0.0.0/0 <Management Default Gateway>
!
interface mgmt0
  vrf member management
  ip address <Management CIDR Address>
!

Resize and Enable the Guestshell

Either the guestshell is running or not restarting or enabling the service is required after the the following step.

Resize the guestshell disk space, memory and cpu by executing the next commands:

guestshell resize rootfs 1024
guestshell resize memory 2048
guestshell resize cpu 6

If the guestshell is not enable, proceed to activate it by executing “guestshell enable”, otherwise, if it was already running please run “guestshell reboot” command in order to restart the shell.

Verify that the guestshell is activated again:

switch# show guestshell detail

Download Agent Installer

We can easily copy the installation agents over HTTPS from the AOS server. After downloading, please confirm the MD5sum of your downloaded copy matches what AOS stores.

Note

The Cisco device needs to connect to the AOS Server using HTTPS in order to retrieve the agent file, please make sure that this connectivity is OK before proceeding.

Apstra ships the AOS agent from the AOS Server. We can copy it to the /volatile, or volatile: filesystem location. AOS also ships with an md5sum file in the /home/admin folder on the AOS Server.

Replace the aos_server_ip variable and aos_version from the run file below, you can find this exact version from the AOS Server, Platform –> About (i.e ‘3.2.2-12’)

switch# guestshell run sudo chvrf management wget --no-check-certificate -o /volatile/aos_download.log
-O /volatile/aos.run https://<aos_server_ip>/device_agent_images/aos_device_agent_<aos_version>.run

guestshell run sudo chvrf management wget --no-check-certificate -o /volatile/aos_download.log
-O /volatile/aos.run.md5 https://<aos_server_ip>/device_agent_images/aos_device_agent_<aos_version>.run.md5

Validate that the file was downloaded correctly.

switch# show file volatile:aos.run md5
a28780880a8d674f6eb6a397509db101

switch# show file volatile:aos.run.md5
a28780880a8d674f6eb6a397509db101  aos_device_agent_<aos_version>.run

Install Cisco Device Agent

The AOS agent on Cisco is simply installed by running it as a shell script directly as root on the Cisco NXOS switch. This command must be done within the guest shell. After installing the agent and before starting the service, aos.conf file needs to be modified to connect to the server.

Note

It is recommended to save your current running-config to the startup-config ‘copy running-config startup-config’ to save your latest changes in case of any issue.

switch# guestshell run sudo chmod +x /volatile/aos.run
switch# guestshell run sudo /volatile/aos.run -- --no-start
<omitted output>
created 7855 files
created 1386 directories
created 602 symlinks
created 0 devices
created 0 fifos
+ [[ True == \T\r\u\e ]]
+ true
+ systemctl enable aos

Change the required parameters in the AOS configuration file before enabling the AOS service (see next steps).

AOS Device Agent Configuration File

Device agent configuration can be managed by editing the device agent configuration file directly. The Cisco NX-OS device agent config file is located at /etc/aos/aos.conf. See AOS Device Agent Configuration file for parameters. After updating the file, start the AOS device agent.

service aos start

Activating AOS Devices on the AOS Server

When the AOS Device agent communicates with AOS, it uses a ‘device key’ to identify itself. In the case of A Cisco NXOS switch, the device key is the MAC address of the management interface ‘eth0’.

root@Cisco:/etc/aos# ip link show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 08:00:27:8a:39:05 brd ff:ff:ff:ff:ff:ff

Deploy the device

Once the Agent is up and running it will appear under Managed Devices, and can be Acknowledged and assigned to a Blueprint using the UI as normal.

Resetting the AOS agent

If you need to reset the AOS agent for some reason (changing blueprints, redeploying, restoring device from backup, etc) it is best to clear the AOS agent metadata, re-register the device, and redeploy to the blueprint.

C9K-172-20-65-5# guestshell
[guestshell@guestshell ~]$ sudo su -
[root@guestshell ~]# systemctl stop aos
[root@guestshell ~]# rm -rf /var/log/aos/*
[root@guestshell ~]# systemctl start aos

Starting AOS Agents...root@guestshell ~]#

Uninstalling the AOS device agent

To uninstall the agent, first Undeploy and Unassign it from the blueprint as per standard procedures using the UI. It can also be deleted entirely from the Managed Devices page.

To remove the AOS package from NX-OS we can destroy the guestshell. This should only be done if no other applications are making use of the guestshell:

C9K-172-20-65-5# guestshell destroy

Remove remaining AOS data from system
Removing the guest-shell deletes most of the data left by AOS.  Some files are
still on the bootflash:/.aos folder.

C9K-172-20-65-5# delete bootflash:.aos no-prompt

Remove AOS EEM Scripts

The AOS device agent installs some event manager applets to assist with telemetry. These can be safely removed

C9K-172-20-65-5(config)# no event manager applet AOS_PROTO_VSH_LAUNCH C9K-172-20-65-5(config)# no event manager applet AOS_STATS_VSH_LAUNCH C9K-172-20-65-5(config)# no event manager applet aos_bgp_applet C9K-172-20-65-5(config)# no event manager applet aos_ifdown_applet C9K-172-20-65-5(config)# no event manager applet aos_ifup_applet

Cisco Agent Troubleshooting

The AOS Agent runs under the NXOS guestshell to interact with the underlying bash and linux environments. This is an internal Linux Container (LXC) in which AOS operates. Under LXC, AOS makes use of the NXAPI and other methods to directly communicate with NXOS. For security reasons, Cisco partitions much of the LXC interface away from the rest of the NXOS device, so we must drop to the guest shell bash prompt to perform more troubleshooting commands.

Confirm the Guest Shell is running on NX-OS The AOS Agent runs under the NXOS Guest Shell to interact with the underlying bash and linux environments. This is an internal Linux Container (LXC) in which AOS operates. We are checking to make sure the guest shell is activated and running.

C9K-172-20-65-5# show guestshell detail
Virtual service guestshell+ detail
  State             : Activated
  Package information
Name            : guestshell.ova
Path            : /isanboot/bin/guestshell.ova
Application
  Name          : GuestShell
  Installed version : 2.1(0.0)
  Description   : Cisco Systems Guest Shell
Signing
  Key type      : Cisco release key
  Method        : SHA-1
Licensing
  Name          : None
  Version       : None
  Resource reservation
Disk            : 1024 MB
Memory          : 3072 MB
CPU             : 6% system CPU

  Attached devices
Type          Name    Alias
---------------------------------------------
Disk          _rootfs
Disk          /cisco/core
Serial/shell
Serial/aux
Serial/Syslog             serial2
Serial/Trace              serial3

Showing registered services

C9K-172-20-65-5# show virtual-service list

Virtual Service List:

Name                Status         Package Name
-----------------------------------------------------------------------
guestshell+         Activated      guestshell.ova

Confirm network reachability to AOS

Check ICMP Ping to the AOS Server by pinging within the guest shell. On NXOS, we have to use the ‘chvrf <vrf>’ command to run commands within the context of a VRF. In this case, ‘management’ VRF.

[guestshell@guestshell ~]$ chvrf management ping 172.20.65.3
PING 172.20.65.3 (172.20.65.3) 56(84) bytes of data.
64 bytes from 172.20.65.3: icmp_seq=1 ttl=64 time=0.239 ms
64 bytes from 172.20.65.3: icmp_seq=2 ttl=64 time=0.215 ms

Confirm agent installation

Check if the AOS device agent package is installed. In NXOS, the AOS agent installs to /etc/rc.d/init.d/aos to start when the guestshell instance starts.

[guestshell@guestshell ~]$ systemctl status aos
aos.service - LSB: Start AOS device agents
   Loaded: loaded (/etc/rc.d/init.d/aos)
   Active: active (running) since Tue 2016-11-15 00:10:49 UTC; 3h 54min ago
  Process: 30 ExecStart=/etc/rc.d/init.d/aos start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/aos.service
       ├─113 tacspawner --daemonize=/var/log/aos/aos.log --pidfile=/var/run/aos.pid --name=SAL2028T5NE --hostname=localhost --domainSocket=aos_spawner_sock --hostSysdbAddress=tb...
       ├─115 tacleafsysdb --agentName=SAL2028T5NE-LocalTasks-SAL2028T5NE-0 --partition= --storage-mode=persistent --eventLogDir=. --eventLogSev=TaccSpawner/error,Mounter/error,M...
       ├─116 /usr/bin/python /bin/aos_agent --class=aos.device.common.ProxyDeploymentAgent.ProxyDeploymentAgent --name=DeploymentProxyAgent device_type=Cisco serial_number=@(SWI...
       ├─117 /usr/bin/python /bin/aos_agent --class=aos.device.common.ProxyCountersAgent.ProxyCountersAgent --name=CounterProxyAgent device_type=Cisco serial_number=@(SWITCH_UNI...
       └─118 /usr/bin/python /bin/aos_agent --class=aos.device.cisco.CiscoTelemetryAgent.CiscoTelemetryAgent --name=DeviceTelemetryAgent serial_number=@(SWITCH_UNIQUE_ID)

Check if the AOS Agent is running

Check the running system state with the ‘service’ command, and check running processes with the ‘ps’ command. We are looking to confirm aos_agent is running properly.

[root@guestshell ~]# service aos status
aos is running

[root@guestshell ~]# ps wax
  PID TTY  STAT   TIME COMMAND
1 ?    Ss 0:00 /sbin/init
9 ?    Ss 0:00 /usr/lib/systemd/systemd-journald
   19 ?    Ss 0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
   22 ?    Ss 0:00 /usr/lib/systemd/systemd-logind
   29 ?    Ss 0:00 /usr/sbin/sshd -D -f /etc/ssh/sshd_config-cisco -p 17682 -o ListenAddress=localhost
   38 ?    Ss 0:00 /usr/sbin/crond -n
   55 pts/1Ss+0:00 /sbin/agetty --noclear ttyS1
   56 pts/0Ss+0:00 /sbin/agetty --noclear ttyS0
  113 ?    Sl 0:01 tacspawner --daemonize=/var/log/aos/aos.log --pidfile=/var/run/aos.pid --name=C9K --hostname=localhost --domainSocket=aos_spawner_sock --hostSysdbAdd
  115 ?    S  0:03 tacleafsysdb --agentName=C9K-LocalTasks-C9K-0 --partition= --storage-mode=persistent --eventLogDir=. --eventLogSev=TaccSpawner/error,Mounter/
  116 ?    Sl 0:01 /usr/bin/python /bin/aos_agent --class=aos.device.common.ProxyDeploymentAgent.ProxyDeploymentAgent --name=DeploymentProxyAgent device_type=Cisco serial_numbe
  117 ?    Sl 0:19 /usr/bin/python /bin/aos_agent --class=aos.device.common.ProxyCountersAgent.ProxyCountersAgent --name=CounterProxyAgent device_type=Cisco serial_number=@(SWI
  118 ?    Sl 0:02 /usr/bin/python /bin/aos_agent --class=aos.device.cisco.CiscoTelemetryAgent.CiscoTelemetryAgent --name=DeviceTelemetryAgent serial_number=@(SWITCH_UNIQUE_ID)
  700 ?    Ss 0:00 sshd: guestshell [priv]
  702 ?    S  0:00 sshd: guestshell@pts/4
  703 pts/4Ss 0:00 bash -li
  732 pts/4S  0:00 sudo su -
  733 pts/4S  0:00 su -
  734 pts/4S  0:00 -bash
  823 pts/4R+ 0:00 ps wax

Check for presence of files in /etc/aos

Under the guest shell, AOS stores a number of configuration files under /etc/aos.

[root@guestshell aos]# ls -lah /etc/aos
total 44K
drwxr-xr-x  2 root root 4.0K Nov 15 00:05 .
drwxr-xr-x 63 root root 4.0K Nov 15 00:09 ..
-rwxr-xr-x  1 root root 1.1K Nov 14 22:26 agent.json
-rw-r--r--  1 root root 1.1K Nov 15 00:05 aos.conf
-rwxr-xr-x  1 root root  992 Nov 14 22:26 common_functions
-rwxr-xr-x  1 root root 1.4K Nov 14 22:26 health_check_functions
-rwxr-xr-x  1 root root  450 Nov 14 22:26 iproute2_functions
-rwxr-xr-x  1 root root  916 Nov 14 22:26 lsb_functions
-rwxr-xr-x  1 root root 4.5K Nov 14 22:26 platform_functions
-rwxr-xr-x  1 root root  156 Nov 14 22:26 version

Check for AOS data in /var/log/aos

AOS writes the internal database to /var/log/aos

[root@guestshell aos]# ls -lah /var/log/aos
total 500K
drwxr-xr-x 2 root root  480 Nov 15 00:10 .
drwxr-xr-x 3 root root  120 Nov 15 00:10 ..
-rw-r--r-- 1 root root 3.2K Nov 15 00:11 CounterProxyAgent.117.1479168658.log
-rw-r--r-- 1 root root 289K Nov 15 02:27 CounterProxyAgent.err
-rw-r--r-- 1 root root0 Nov 15 00:10 CounterProxyAgent.out
-rw------- 1 root root  31K Nov 15 00:11 CounterProxyAgentC9K_2016-11-15--00-10-59_117-2016-11-15--00-10-59.tel
-rw-r--r-- 1 root root  104 Nov 15 00:45 DeploymentProxyAgent.116.1479168650.log
-rw-r--r-- 1 root root  12K Nov 15 00:45 DeploymentProxyAgent.err
-rw-r--r-- 1 root root0 Nov 15 00:10 DeploymentProxyAgent.out
-rw------- 1 root root  31K Nov 15 00:10 DeploymentProxyAgentC9K_2016-11-15--00-10-51_116-2016-11-15--00-10-51.tel
-rw-r--r-- 1 root root 4.1K Nov 15 00:11 DeviceTelemetryAgent.118.1479168657.log
-rw-r--r-- 1 root root 1.4K Nov 15 00:11 DeviceTelemetryAgent.err
-rw-r--r-- 1 root root0 Nov 15 00:10 DeviceTelemetryAgent.out
-rw------- 1 root root  31K Nov 15 00:11 DeviceTelemetryAgentC9K_2016-11-15--00-10-58_118-2016-11-15--00-10-58.tel
-rw-r--r-- 1 root root0 Nov 15 00:10 C9K-0.115.1479168649.log
-rw-r--r-- 1 root root0 Nov 15 00:10 C9K-0.err
-rw-r--r-- 1 root root0 Nov 15 00:10 C9K-0.out
-rw------- 1 root root  39K Nov 15 00:10 C9K-LocalTasks-C9K-0_2016-11-15--00-10-50_115-2016-11-15--00-10-50.tel
-rw------- 1 root root  36K Nov 15 00:10 Spawner-C9K_2016-11-15--00-10-49_111-2016-11-15--00-10-49.tel
-rw------- 1 root root  634 Nov 15 00:10 _C9K-00000000582a528a-0001744b-checkpoint
-rw-r--r-- 1 root root0 Nov 15 00:10 _C9K-00000000582a528a-0001744b-checkpoint-valid
-rw------- 1 root root0 Nov 15 00:10 _C9K-00000000582a528a-0001744b-log
-rw-r--r-- 1 root root0 Nov 15 00:10 _C9K-00000000582a528a-0001744b-log-valid
-rw-r--r-- 1 root root0 Nov 15 00:10 aos.log
[root@guestshell aos]#

Determining AOS Agent version

The AOS agent version is available in /etc/aos/version. Before executing this command we need to attach to aos service.

[root@guestshell admin]# service aos attach
aos@guestshell:/# cat /etc/aos/version
VERSION=99.0.0-3874
BUILD_ID=AOS_latest_OB.3874
BRANCH_NAME=master
COMMIT_ID=d3eb2585608f0509a11b95fb9d07aed6e26d6c32
BUILD_DATETIME=2018-05-20_10:22:32_PDT
AOS_DI_RELEASE=2.2.0-169
aos@guestshell:/#

DNS resolution failure

AOS agent is sensitive to the DNS resolution of the metadb connection. Ensure that the IP and/or DNS from /etc/aos/aos.conf is reachable from the device eth0 management port.

[root@guestshell ~]#  aos_show_tech | grep -i dns
[2016/10/20 23:04:20.534538UTC@event-'warning']:(textMsg=Failing outgoing mount to <'tbt://aos-server:29731/Data/ReplicaStatus?flags=i','/Metadb/ReplicaStatus'>' due to code 'resynchronizing' and reason 'Dns lookup issue "Temporary failure in name resolution" Unknown error 18446744073709551613)
[2016/10/20 23:04:21.540444UTC@OutgoingMountConnectionError-'warning']:(connectionName=--NONE--,localPath=/Metadb/ReplicaStatus,remotePath=tbt://aos-server:29731/Data/ReplicaStatus?flags=i,msg=Tac::ErrnoException: Dns lookup issue "Temporary failure in name resolution" Unknown error 18446744073709551613)
[2016/10/20 23:04:21.541174UTC@event-'warning']:(textMsg=Failing outgoing mount to <'tbt://aos-server:29731/Data/ReplicaStatus?flags=i','/Metadb/ReplicaStatus'>' due to code 'resynchronizing' and reason 'Dns lookup issue "Temporary failure in name resolution" Unknown error 18446744073709551613)

Insufficient Guestshell filesystem size
An error message ‘AOS Agent needs XXMB on the / filesystem’ will occur if the rootfs partition is not at least 1GB large.  Please make sure to resize the guestshell filesystem to 2gb ram, 1gb disk, and 6% CPU.

<snip>
+ popd
/tmp/selfgz18527139
+ rpm -Uvh --nodeps --force /tmp/selfgz18527139/aos-device-agent-1.1.0-0.1.1108.x86_64.rpm
Preparing...                          ################################# [100%]
installing package aos-device-agent-1.1.0-0.1.1108.x86_64 needs 55MB on the / filesystem

AOS Service takes long time to start on Cisco NXOS

The GuestShell feature on Cisco NXOS takes a few minutes to initialize the NXAPI within the LXC container. Apstra AOS does not have control over this to make it any faster. Apstra Engineering has added a wait-delay to the initialization of the AOS scripts to account for this delay. This wait is normal.

AOS stops and fails without any errors (MGMT VRF)

Please ensure that the guestshell is properly behind management VRF.

We should not be able to ping the AOS server when running ‘ping’ command by default:

Below - we expect a ping from global default routing table to AOS server at 172.20.156.3 to fail, but succeed under the guest shell.

SAL2028T5PP-172-20-156-5# ping 172.20.156.3
PING 172.20.156.3 (172.20.156.3): 56 data bytes
ping: sendto 172.20.156.3 64 chars, No route to host
^C
--- 172.20.156.3 ping statistics ---
1 packets transmitted, 0 packets received, 100.00% packet loss
SAL2028T5PP-172-20-156-5# ping 172.20.156.3 vrf management
PING 172.20.156.3 (172.20.156.3): 56 data bytes
64 bytes from 172.20.156.3: icmp_seq=0 ttl=63 time=0.649 ms
64 bytes from 172.20.156.3: icmp_seq=1 ttl=63 time=0.449 ms
64 bytes from 172.20.156.3: icmp_seq=2 ttl=63 time=0.428 ms
64 bytes from 172.20.156.3: icmp_seq=3 ttl=63 time=0.423 ms
64 bytes from 172.20.156.3: icmp_seq=4 ttl=63 time=0.404 ms
^C

Verify MGMT VRF in NXOS Guest Shell

[root@guestshell ~]# ping 172.20.157.3
connect: Network is unreachable

[root@guestshell ~]# sudo ip netns exec management ping 172.20.156.3
PING 172.20.156.3 (172.20.156.3) 56(84) bytes of data.
64 bytes from 172.20.156.3: icmp_seq=1 ttl=64 time=0.226 ms
64 bytes from 172.20.156.3: icmp_seq=2 ttl=64 time=0.232 ms
^C

Contact Apstra Global Support

Apstra Global Support is available to assist with troubleshooting. Diagnostic information will most likely be needed to help resolve issues. Please see the Apstra Global Support page for details.