Cumulus Device Agent
Although the preferred method of installing device system agents is by creating agents in the Apstra GUI, you can manually install Apstra agents from the CLI. Only in rare exceptions is it needed to manually install agents, which requires more effort and is error-prone. An in-depth understanding of the various device states, configuration stages, and agent operations is required before manually installing agents. For assistance, contact Juniper Support.
Quick start
This section is a quick-start steps for installing the Cumulus device agent. The remaining sections describe the steps in detail.
Cumulus Initial Configuration
Prior to being used with the Apstra software, Cumulus device agents require certain configuration. We've added a local username admin with password admin as part of our provisioning process. By default, the Cumulus credentials are username cumulus and password CumulusLinux! - Apstra does not depend on the username to be changed.
CumulusVX
If you're deploying CumulusVX, (Virtual appliance), ensure the virtual switch is provided at least 1 vCPU and 2GB RAM. Cumulus VX 3.1.1 OVA template provides only for 512MB RAM. This will need to be increased.
Management Interface
Cumulus device agents require a management VRF to be set up before running the Apstra device installer. The ‘bash’ shell must also be running under the context of the management VRF prior to installation to ensure the Apstra agent can communicate to the Apstra server.
By default, Cumulus management VRF is not activated.
root@cumulus:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/*.intf # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet dhcp
Add a new auto mgmt interface as a VRF and activate the eth0 interface for vrf mgmt.
root@cumulus:~# vi /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/*.intf # The loopback network interface auto lo iface lo inet loopback auto mgmt iface mgmt address 127.0.0.1/8 vrf-table auto post-up sudo service aos start auto eth0 iface eth0 inet dhcp vrf mgmt
After the VRF is activated, reload the network file with ifreload
-a
, and log back into the switch afterwards. This ensures the Bash
prompt is under the management VRF routing context. Pay particular note to the
new bash prompt, admin@cumulus:mgmt-vrf:~$ which indicates bash is running under
the mgmt-vrf context.
root@cumulus:/etc/network# ifreload -a <reconnect with SSH> Welcome to Cumulus VX (TM) Cumulus VX (TM) is a community supported virtual appliance designed for experiencing, testing and prototyping Cumulus Networks' latest technology. For any questions or technical support, visit our community site at: http://community.cumulusnetworks.com The registered trademark Linux (R) is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. Last login: Fri May 26 12:41:13 2017 from 192.168.25.1 admin@cumulus:mgmt-vrf:~$
Install Cumulus License
To run properly on most platforms, a license must be installed for the switch. If
a license is not installed, deployment failures may occur because the command
ifreload -a
will fail because the switchd
service isn't running.
You can pre-provision a Cumulus license for the Apstra ZTP server. You can install it manually.
If a license isn't installed, and this is not a cumulus VX platform, the command
cl-license
will fail.
admin@cumulus:mgmt-vrf:~$ sudo service switchd start Job for switchd.service failed. See 'systemctl status switchd.service' and 'journalctl -xn' for details. admin@cumulus:mgmt-vrf:~$ cl-license 1508122225.507863 2017-10-16 02:50:25 license.c:318 CRIT No license file. No license installed! admin@cumulus:mgmt-vrf:~$ sudo cat /var/log/syslog 2017-10-16T02:53:04.661850+00:00 cumulus systemd[1]: Failed to start Cumulus Linux Switch Daemon. 2017-10-16T02:53:04.662375+00:00 cumulus systemd[1]: Dependency failed for Cumulus Linux Port Watch Event Daemon. 2017-10-16T02:53:04.663217+00:00 cumulus systemd[1]: Dependency failed for Cumulus Linux acltool. 2017-10-16T02:53:04.664004+00:00 cumulus systemd[1]: Unit switchd.service entered failed state. 2017-10-16T02:53:04.687029+00:00 cumulus Failure: Not running cl-support for portwd.service, failure #2 status: Result=success ExecMainCode=2 ExecMainStatus=15 2017-10-16T02:53:04.730186+00:00 cumulus Failure: Not running cl-support for switchd.service, failure #2 status: Result=exit-code ExecMainCode=1 ExecMainStatus=2
Add the license key.
admin@cumulus:mgmt-vrf:~$ sudo cl-license -i Paste license text here, then hit ctrl-d user@company.com|example_license_text/here License file installed. Service 'switchd' is not running. Run this command: sudo systemctl restart switchd Or reboot to enable functionality. License file installed.
After the license is installed, restart the switchd service.
admin@cumulus:mgmt-vrf:~$ sudo service switchd stop admin@cumulus:mgmt-vrf:~$ sudo service switchd start
Download Agent Installer
Apstra device agent installation files for Cumulus are available from the Apstra server, served from the URL https://aos-server/device_agent_images/aos_device_agent.run
For validating the downloaded file, a .md5 file is also available. Copy the .run file
to the Cumulus switch, with the command vrf task exec mgmt wget -nv
--no-check-certificate
https://aos-server/device_agent_images/aos_device_agent.run
This command assumes bash is running under the 'mgmt' vrf context. If this is not
the case, omit the section vrf task exec mgmt
and download the
file normally.
root@cumulus:mgmt-vrf:~# vrf task exec mgmt wget -nv --no-check-certificate https://aos-server/device_agent_images/aos_device_agent.run WARNING: The certificate of ‘aos-server’ is not trusted. WARNING: The certificate of ‘aos-server’ hasn't got a known issuer. The certificate's owner does not match hostname ‘aos-server’ 2017-10-15 23:32:55 URL:https://aos-server/device_agent_images/aos_device_agent.run [57245356/57245356] -> "aos_device_agent.run" [1]
root@cumulus:mgmt-vrf:~#vrf task exec mgmt wget -nv --no-check-certificate https://aos-server/device_agent_images/aos_device_agent.run.md5 WARNING: The certificate of ‘aos-server’ is not trusted. WARNING: The certificate of ‘aos-server’ hasn't got a known issuer. The certificate's owner does not match hostname ‘aos-server’ 2017-10-15 23:34:50 URL:https://aos-server/device_agent_images/aos_device_agent.run.md5 [65/65] -> "aos_device_agent.run.md5" [1] root@cumulus:mgmt-vrf:~# root@cumulus:mgmt-vrf:~# cat aos_device_agent.run.md5 70d58a0aaa5ed4519b87ced74003476a aos_device_agent_2.0.0-210.run root@cumulus:mgmt-vrf:~# root@cumulus:mgmt-vrf:~# md5sum aos_device_agent.run 70d58a0aaa5ed4519b87ced74003476a aos_device_agent.run
Install Cumulus Device Agent
To install the Apstra device agent on cumulus, run the ‘.run’ file available from the Apstra Server. Once the file is downloaded, run it as a shell command. Make sure that the Apstra device agent is installed while bash is under the mgmt-vrf routing-context.
Run sudo sh aos_device_agent.run
admin@cumulus:mgmt-vrf:~$ sudo sh aos_device_agent.run Verifying archive integrity... All good. Uncompressing AOS Device Agent installer 100% + set -o pipefail +++ dirname ./agent_installer.sh ++ cd . ++ pwd + script_dir=/tmp/selfgz8796 ++ date + echo 'Device Agent Installation : Mon' Oct 16 00:19:57 UTC 2017 Device Agent Installation : Mon Oct 16 00:19:57 UTC 2017 + echo + UNKNOWN_PLATFORM=1 + WRONG_PLATFORM=1 + CANNOT_EXECUTE=126 + '[' 0 -ne 0 ']' + arg_parse + start_aos=True + [[ 0 > 0 ]] + supported_platforms=(["centos"]="install_sysvinit_rpm" ["eos"]="install_on_arista" ["nxos"]="install_on_nxos" ["cumulus"]="install_sysvinit_deb" ["trusty"]="install_sysvinit_deb" ["icos"]="install_sysvinit_rpm" ["snaproute"]="install_sysvinit_deb" ["simulation"]="install_sysvinit_deb") + declare -A supported_platforms ++ /tmp/selfgz8796/aos_get_platform + current_platform=cumulus + installer=install_sysvinit_deb + [[ -z install_sysvinit_deb ]] + [[ -x /etc/init.d/aos ]] + install_sysvinit_deb ++ pwd + local pkg_dir=/tmp/selfgz8796/sysvinit_deb + dpkg -s aos-device-agent + dpkg --purge aos-device-agent (Reading database ... 25364 files and directories currently installed.) Removing aos-device-agent (2.0.0-210) ... Purging configuration files for aos-device-agent (2.0.0-210) ... + dpkg -i /tmp/selfgz8796/sysvinit_deb/aos-device-agent-2.0.0-210.amd64.deb Selecting previously unselected package aos-device-agent. (Reading database ... 25364 files and directories currently installed.) Preparing to unpack .../aos-device-agent-2.0.0-210.amd64.deb ... Unpacking aos-device-agent (2.0.0-210) ... Setting up aos-device-agent (2.0.0-210) ... Processing triggers for systemd (215-17+deb8u4) ... + mkdir -p /opt/aos + cp aos_device_agent.img /opt/aos + post_install_common + /etc/init.d/aos config_gen grep: /etc/aos/aos.conf: No such file or directory + [[ True == \T\r\u\e ]] +++ readlink /sbin/init ++ basename /lib/systemd/systemd + [[ systemd == systemd ]] + systemctl start aos
If an aos.conf
file doesn't exist at first agent startup, the Apstra
software creates one.
Device Agent Configuration File
You manage gevice agent configuration by editing the device agent configuration file
directly. The Cumulus device agent config file is located at
/etc/aos/aos.conf
. See Apstra device agent configuration file for parameters. After updating
the file, restart the Apstra device agent.
service aos stop service aos start
Device Agent Management
Bootstrap Configuration
The concept of bootstrap configuration relates to the
/etc/network/interfaces
configuration section as applicable
to the management IP address. Bootstrap configuration is prepended to
configuration jobs that Apstra pushes. This helps ensure that Apstra does not
overwrite any network configuration that may prevent the agent from reaching the
controller.
The bootstrap configuration is typically pushed by configuration by the user, or automated with software such as Apstra's Aeon ZTP server, or the Apstra device installer project.
Bootstrap configuration contains the minimum necessary network settings for Apstra to connect to the Apstra controller.
auto mgmt iface mgmt address 127.0.0.1/8 vrf-table auto post-up sudo service aos start # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). auto eth0 iface eth0 inet dhcp vrf mgmt
Cumulus Device Configuration Management
The Cumulus device agent manages the following files on the filesystem:
- /etc/cumulus/ports.conf - specifies how port breakouts are consumed on the Cumulus platform
- /etc/frr/frr.conf - contains all routing information for BGP on the device
- /etc/network/interfaces - handles all Layer 2 and Layer 3 configuration on the device, including CLAG, VLANs, VXLAN, IP Routing
- /etc/hostname - the file that Apstra manages the device hostname through
- /etc/default/isc-dhcp-relay - specifies interfaces that participate in DHCP Relay
Deploy Device
Once the Agent is up and running it appears under Managed Devices, and can be Acknowledged and assigned to a Blueprint using the GUI per standard procedure.
Uninstall Apstra Device Agent
To remove the Apstra device agent you'll stop the agent, remove it from Apstra, then uninstall the agent.
You can unstall the Apstra device Agent on Cumulus with the steps below. Since the Apstra agent itself is mostly a compressed (squashfs) image, it can be uninstalled in a few steps.
When uninstalling the Apstra configuration applied to the various files (frr.conf, /etc/network/interfaces, etc) is removed.
Stop Apstra Service
To prevent the agent from immediately re-registering to the Apstra server, stop Apstra service on the agent before uninstalling the agent.
admin@cumulus:mgmt-vrf:~$ sudo service aos stop admin@cumulus:mgmt-vrf:~$ sudo service aos status aos.service - LSB: Start AOS device agents Loaded: loaded (/etc/init.d/aos) Active: inactive (dead) since Fri 2017-05-26 17:27:37 UTC; 5s ago Process: 32086 ExecStop=/etc/init.d/aos stop (code=exited, status=0/SUCCESS) Process: 31268 ExecStart=/etc/init.d/aos start (code=exited, status=0/SUCCESS)
Remove Device
Using the Apstra GUI, undeploy and unassign the device from the blueprint per standard procedures. You can also delete it entirely from the Managed Devices page.
If you uninstall the agent before removing it from the Apstra GUI, existing configuration can no longer be erased.
Remove Apstra Package from Cumulus
Remove the Apstra package with the following command:
sudo dpkg -r aos-device-agent
admin@cumulus:mgmt-vrf:~$ sudo dpkg -r aos-device-agent (Reading database ... 25366 files and directories currently installed.) Removing aos-device-agent (2.0.0-210) ... Processing triggers for systemd (215-17+deb8u4) ...
Clean up any other lingering files left on the filesystem.
sudo rm -rf /opt/aos/ sudo rm -rf /etc/aos sudo rm -rf /var/log/aos sudo rm -rf /run/aos/* sudo rm -rf /mnt/persist/.aos sudo rm -rf /run/aos sudo rm -rf /run/lock/aos sudo rm -rf /tmp/aos_show_tech sudo rm -rf /usr/sbin/aos*
Check if any other Apstra files remain on the filesystem.
admin@cumulus:mgmt-vrf:~$ sudo find / -iname '*aos*' find: File system loop detected; `/.snapshots/1/snapshot' is part of the same file system loop as `/'. /home/admin/aos_device_agent_1.2.0-137_cumulus.run /var/lib/dpkg/info/aos-device-agent.postrm /var/lib/dpkg/info/aos-device-agent.list
Optionally, remove the management VRF from
/etc/network/interfaces
.
Cumulus Agent Troubleshooting
- Check Apstra Status
- List Running Processes
- Display and Read Apstra Log Files
- Configuration not Pushed
- Can't Ping Apstra or Use Other Network Tools
- Avg State is CRITICAL
Check Apstra Status
Running service aos status
provides output of Apstra service
status.
root@cumulus:mgmt-vrf:~# service aos status aos.service - LSB: Start AOS device agents Loaded: loaded (/etc/init.d/aos) Active: active (running) since Fri 2017-05-26 17:42:39 UTC; 1min 59s ago Process: 32086 ExecStop=/etc/init.d/aos stop (code=exited, status=0/SUCCESS) Process: 32381 ExecStart=/etc/init.d/aos start (code=exited, status=0/SUCCESS) CGroup: /system.slice/aos.service ├─32468 tacspawner --daemonize=/var/log/aos/aos.log --pidfile=/var/run/aos.pid --name=000C29CBF3A8 --hostname=000C29CBF3A8 --domainSocket=aos_spawner_sock --hostSysdbAddress=tbl://aos_localtasks_sock --jsonConfig=/etc/aos/cumulus/agent.json --eventLogSev=TaccSpawner/error,Mounter/error,Mountee/error,Nb... ├─32470 tacleafsysdb --agentName=000C29CBF3A8-LocalTasks-000C29CBF3A8-0 --partition= --storage-mode=persistent --eventLogDir=. --eventLogSev=TaccSpawner/error,Mounter/error,Mountee/error,NboAttrLog/error ├─32471 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.ProxyDeploymentAgent.ProxyDeploymentAgent --name=DeploymentProxyAgent device_type=Cumulus serial_number=@(SYSTEM_UNIQUE_ID) ├─32474 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.DeviceKeeperAgent.DeviceKeeperAgent --name=DeviceKeeperAgent serial_number=@(SYSTEM_UNIQUE_ID) ├─32484 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.ProxyCountersAgent.ProxyCountersAgent --name=CounterProxyAgent device_type=Cumulus serial_number=@(SYSTEM_UNIQUE_ID) ├─32511 /usr/bin/python /usr/bin/aos_agent --class=aos.device.cumulus.CumulusTelemetryAgent.CumulusTelemetryAgent --name=DeviceTelemetryAgent serial_number=@(SYSTEM_UNIQUE_ID) ├─32748 sh -c curl -k -f -sS -H "Content-Type: application/json" -X POST -d '{"version":"1.2.0-137","serial_number":"000C29CBF3A8","platform":"cumulus"}' https://aos-server/api/versions/device 2>&1 └─32749 curl -k -f -sS -H Content-Type: application/json -X POST -d {"version":"1.2.0-137","serial_number":"000C29CBF3A8","platform":"cumulus"} https://aos-server/api/versions/device
List Running Processes
You can attach to the Apstra container with service aos attach
command, then run normal Linux commands within the container. Any changes within
this container are lost/destroyed after the container restarts again, it’s a
read-only instance.
root@cumulus:mgmt-vrf:/var/log/aos# service aos attach aos@mclag-compute-1-leaf1:/# ps wax PID TTY STAT TIME COMMAND 1 ? Ss 0:21 /sbin/init 2 ? S 0:00 [kthreadd] 3 ? S 0:04 [ksoftirqd/0] 5 ? S< 0:00 [kworker/0:0H] 7 ? S 1:28 [rcu_sched] 8 ? S 0:00 [rcu_bh] 9 ? S 0:00 [migration/0] 10 ? S 0:00 [watchdog/0] 11 ? S 0:00 [watchdog/1] 12 ? S 0:00 [migration/1] 13 ? S 0:04 [ksoftirqd/1] 15 ? S< 0:00 [kworker/1:0H] 16 ? S 0:00 [watchdog/2] 17 ? S 0:00 [migration/2] 18 ? S 0:08 [ksoftirqd/2] 20 ? S< 0:00 [kworker/2:0H] 21 ? S 0:00 [watchdog/3] 22 ? S 0:00 [migration/3] 23 ? S 0:07 [ksoftirqd/3] 25 ? S< 0:00 [kworker/3:0H] 26 ? S< 0:00 [khelper] 27 ? S 0:00 [kdevtmpfs] 28 ? S< 0:00 [netns] 29 ? S< 0:00 [perf] 30 ? S 0:00 [khungtaskd] 31 ? S< 0:00 [writeback] 33 ? SN 0:00 [ksmd] 34 ? SN 0:00 [khugepaged] 35 ? S< 0:00 [crypto] 36 ? S< 0:00 [kintegrityd] 37 ? S< 0:00 [bioset] 38 ? S< 0:00 [kblockd] 39 ? S< 0:00 [ata_sff] 40 ? S< 0:00 [edac-poller] 42 ? S< 0:00 [rpciod] 43 ? S 0:00 [kswapd0] 44 ? S 0:00 [fsnotify_mark] 45 ? S< 0:00 [nfsiod] 54 ? S< 0:00 [kthrotld] 57 ? S 0:00 [scsi_eh_0] 58 ? S< 0:00 [scsi_tmf_0] 59 ? S 0:00 [scsi_eh_1] 60 ? S< 0:00 [scsi_tmf_1] 61 ? S 0:00 [scsi_eh_2] 62 ? S< 0:00 [scsi_tmf_2] 63 ? S 0:00 [scsi_eh_3] 64 ? S< 0:00 [scsi_tmf_3] 69 ? S 0:00 [scsi_eh_4] 70 ? S< 0:00 [scsi_tmf_4] 71 ? S 0:00 [scsi_eh_5] 72 ? S< 0:00 [scsi_tmf_5] 76 ? S< 0:00 [ipv6_addrconf] 77 ? S< 0:00 [deferwq] 128 ? S< 0:00 [bioset] 133 ? S 0:00 [scsi_eh_6] 134 ? S< 0:00 [scsi_tmf_6] 135 ? S 0:03 [usb-storage] 136 ? S 0:00 [scsi_eh_7] 137 ? S< 0:00 [scsi_tmf_7] 138 ? S 0:01 [usb-storage] 146 ? S< 0:00 [kworker/0:1H] 147 ? S< 0:00 [kworker/1:1H] 148 ? S< 0:00 [kworker/3:1H] 149 ? S< 0:00 [kworker/2:1H] 166 ? S< 0:00 [btrfs-worker] 168 ? S< 0:00 [btrfs-worker-hi] 169 ? S< 0:00 [btrfs-delalloc] 170 ? S< 0:00 [btrfs-flush_del] 171 ? S< 0:00 [btrfs-cache] 172 ? S< 0:00 [btrfs-submit] 173 ? S< 0:00 [btrfs-fixup] 174 ? S< 0:00 [btrfs-endio] 175 ? S< 0:00 [btrfs-endio-met] 176 ? S< 0:00 [btrfs-endio-met] 177 ? S< 0:00 [btrfs-endio-rai] 178 ? S< 0:00 [btrfs-endio-rep] 179 ? S< 0:00 [btrfs-rmw] 180 ? S< 0:00 [btrfs-endio-wri] 181 ? S< 0:00 [btrfs-freespace] 182 ? S< 0:00 [btrfs-delayed-m] 183 ? S< 0:00 [btrfs-readahead] 184 ? S< 0:00 [btrfs-qgroup-re] 185 ? S< 0:00 [btrfs-extent-re] 186 ? S 0:00 [btrfs-cleaner] 187 ? S 0:06 [btrfs-transacti] 262 ? S 0:00 [kauditd] 266 ? Ss 0:05 /lib/systemd/systemd-journald 269 ? Ss 0:00 /lib/systemd/systemd-udevd 337 ? S< 0:00 [kvm-irqfd-clean] 542 ? S<sl 0:00 /sbin/auditd -n 694 ? SNs 0:00 /usr/sbin/cron -f -L 38 697 ? Ss 0:00 /lib/systemd/systemd-logind 702 ? Ss 0:02 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 708 ? Ss 0:00 /usr/sbin/mcelog --daemon 710 ? Ss 0:04 /usr/sbin/irqbalance --pid=/var/run/irqbalance.pid 715 ? Ss 0:00 /usr/sbin/acpid 719 ? Ssl 0:01 /usr/sbin/rsyslogd -n 733 ? Ss 0:00 /usr/sbin/wd_keepalive 740 ? Ss 9:17 /usr/bin/python /usr/sbin/smond 741 ? Ss 0:24 /usr/bin/python /usr/sbin/ledmgrd 743 ? S 0:00 /usr/sbin/dnsmasq -x /var/run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg 744 ? Ss 0:43 /usr/bin/python /usr/sbin/pwmd 745 ? S<s 0:05 /sbin/mstpd -d -v2 1053 ? Ss 0:00 /usr/sbin/uuidd --socket-activation 1196 ? Ssl 1:34 /usr/bin/python /usr/bin/arp_refresh 1200 ? Ss 0:00 /usr/bin/python /usr/lib/python2.7/dist-packages/clcmd_server.py > /dev/null 2>&1 1240 ? Ss 0:02 /usr/sbin/ntpd -n -u ntp:ntp -g 1242 ? Ss 0:00 /usr/sbin/ptmd -l INFO 1255 ? Ss 0:00 /usr/sbin/sshd -D 1978 ? Ss 0:00 /sbin/dhclient -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0 2310 ? S 0:00 [kworker/2:2] 4633 ? Ss 0:00 sshd: admin [priv] 4661 ? S 0:00 sshd: admin@pts/2 4662 ? Ss 0:00 -bash 4699 ? S 0:00 sudo su - 4704 ? S 0:00 su - 4726 ? S 0:00 -su 5853 ? S 0:00 [kworker/3:2] 8515 ? S 0:00 [kworker/u8:1] 9264 ? S< 0:00 [kworker/u9:6] 11195 ? S 0:00 [kworker/2:1] 13477 ? S< 0:00 [kworker/u9:0] 13480 ? S< 0:00 [kworker/u9:1] 13481 ? S< 0:00 [kworker/u9:2] 13482 ? S< 0:00 [kworker/u9:3] 13810 ? S 0:00 /bin/bash /etc/init.d/aos attach 13824 ? S 0:00 /lib/systemd/systemd-udevd 13832 ? S 0:00 /bin/bash 13835 ? R+ 0:00 ps wax 14541 ? S 0:00 sudo su - 14547 ? S 0:00 su - 14575 ? S 0:00 -su 14645 ? S 0:00 [kworker/u8:2] 14675 ? S+ 0:00 tail -f DeploymentProxyAgent.err 15057 ? S< 0:00 [kloopd0] 15097 ? S< 0:00 [kworker/u9:7] 15099 ? Sl 0:00 tacspawner --daemonize=/var/log/aos/aos.log --pidfile=/var/run/aos.pid --name=571254X1448071 --hostname=5 15103 ? S 0:00 tacleafsysdb --agentName=571254X1448071-LocalTasks-571254X1448071-0 --partition= --storage-mode=persisten 15107 ? Sl 0:02 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.DeviceKeeperAgent.DeviceKeeperAgent --name=D 15117 ? Sl 0:03 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.ProxyCountersAgent.ProxyCountersAgent --name 15138 ? Sl 0:06 /usr/bin/python /usr/bin/aos_agent --class=aos.device.cumulus.CumulusTelemetryAgent.CumulusTelemetryAgent 15139 ? Sl 0:02 /usr/bin/python /usr/bin/aos_agent --class=aos.device.common.ProxyDeploymentAgent.ProxyDeploymentAgent -- 15373 ttyS1 Ss 0:00 /bin/login -- 15403 ttyS1 S 0:00 -bash 15440 ttyS1 S 0:00 sudo su 15445 ttyS1 S 0:00 su 15467 ttyS1 S+ 0:00 bash 15756 ? Ss 0:00 dhclient -lf /var/lib/dhcp/dhclient.eth0.leases eth0 16287 ? S<sl 5:55 /usr/sbin/switchd 16583 ? Ds 0:29 /usr/bin/python /usr/sbin/portwd 16586 ? S 0:00 [kworker/1:0] 16763 ? S<s 0:00 /usr/lib/quagga/zebra -s 90000000 --daemon -A 127.0.0.1 16770 ? S<s 0:00 /usr/lib/quagga/bgpd --daemon -A 127.0.0.1 16777 ? S<s 0:00 /usr/lib/quagga/watchquagga -adz -r /usr/sbin/servicebBquaggabBrestartbB%s -s /usr/sbin/servicebBquaggabB 16787 ? Ss 0:00 sshd: admin [priv] 16815 ? S 0:00 sshd: admin@pts/1 16816 ? Ss 0:00 -bash 16853 ? S 0:00 sudo su - 16858 ? S 0:00 su - 16880 ? S 0:00 -su 17070 ? S+ 0:00 tail -f DeploymentProxyAgent.err 18757 ? Ss 0:00 lldpd: monitor . 18760 ? S 0:01 lldpd: connected to dutmgmtsw2-eth0.dc1.apstra.com 21990 ? S 0:00 [kworker/2:0] 22942 ? S 0:00 [kworker/3:0] 24846 ? S 0:00 [kworker/0:0] 27138 ? S 0:00 [kworker/3:1] 27389 ? S< 0:00 [bond41] 27584 ? Ssl 0:06 /usr/bin/python /usr/sbin/clagd --daemon 10.0.0.63 bond41.2999 44:38:39:ff:00:01 --priority 32768 --backu 27665 ? S 0:00 /sbin/bridge monitor fdb 27959 ? S 0:00 [kworker/1:1] 28342 ? S 0:00 [kworker/u8:0] 31882 ? Ss 0:00 sshd: admin [priv] 31912 ? S 0:00 sshd: admin@pts/0 31913 ? Ss 0:00 -bash 32063 ? S 0:00 [kworker/0:2]
Display and Read Apstra Log Files
Apstra log files are stored in the /var/log/aos
folder. There is
typically one log file for each Apstra agent that runs. The Apstra device agent
spawns off a series of other device agents for various purposes - telemetry,
configuration rendering, counters, and agent health. You can read the plain-text
.err
files with any text editor.
root@cumulus:mgmt-vrf:/var/log/aos# ls -lah /var/log/aos total 876K drwxr-xr-x 1 root root 4.9K Oct 16 01:42 . drwxr-xr-x 1 root root 360 Oct 16 01:44 .. -rw------- 1 root root 644 Oct 16 01:42 _571254X1448071-0000000059e40e8c-000ea27f-checkpoint -rw-r--r-- 1 root root 0 Oct 16 01:42 _571254X1448071-0000000059e40e8c-000ea27f-checkpoint-valid -rw------- 1 root root 0 Oct 16 01:42 _571254X1448071-0000000059e40e8c-000ea27f-log -rw-r--r-- 1 root root 0 Oct 16 01:42 _571254X1448071-0000000059e40e8c-000ea27f-log-valid -rw-r--r-- 1 root root 0 Oct 16 01:40 571254X1448071-0.14640.1508118008.log -rw-r--r-- 1 root root 0 Oct 16 01:42 571254X1448071-0.15103.1508118156.log -rw-r--r-- 1 root root 0 Oct 16 01:40 571254X1448071-0.err -rw-r--r-- 1 root root 0 Oct 16 01:40 571254X1448071-0.out -rw------- 1 root root 61K Oct 16 01:40 571254X1448071-LocalTasks-571254X1448071-0_2017-10-16--01-40-08_14640-2017-10-16--01-40-08.tel -rw------- 1 root root 105K Oct 16 02:16 571254X1448071-LocalTasks-571254X1448071-0_2017-10-16--01-42-36_15103-2017-10-16--01-42-36.tel -rw-r--r-- 1 root root 0 Oct 16 01:42 aos.log -rw-r--r-- 1 root root 0 Oct 16 01:40 CounterProxyAgent.14642.1508118012.log -rw-r--r-- 1 root root 0 Oct 16 01:40 CounterProxyAgent.14665.1508118023.log -rw-r--r-- 1 root root 0 Oct 16 01:42 CounterProxyAgent.15105.1508118159.log -rw-r--r-- 1 root root 0 Oct 16 01:42 CounterProxyAgent.15117.1508118171.log -rw------- 1 root root 33K Oct 16 01:40 CounterProxyAgent571254X1448071_2017-10-16--01-40-14_14642-2017-10-16--01-40-14.tel -rw------- 1 root root 42K Oct 16 01:40 CounterProxyAgent571254X1448071_2017-10-16--01-40-24_14665-2017-10-16--01-40-24.tel -rw------- 1 root root 33K Oct 16 01:42 CounterProxyAgent571254X1448071_2017-10-16--01-42-41_15105-2017-10-16--01-42-41.tel -rw------- 1 root root 42K Oct 16 01:42 CounterProxyAgent571254X1448071_2017-10-16--01-42-52_15117-2017-10-16--01-42-52.tel -rw-r--r-- 1 root root 3.0K Oct 16 01:42 CounterProxyAgent.err -rw-r--r-- 1 root root 0 Oct 16 01:40 CounterProxyAgent.out -rw-r--r-- 1 root root 0 Oct 16 01:40 DeploymentProxyAgent.14641.1508118018.log -rw-r--r-- 1 root root 0 Oct 16 01:42 DeploymentProxyAgent.15104.1508118166.log -rw-r--r-- 1 root root 0 Oct 16 01:42 DeploymentProxyAgent.15139.1508118176.log -rw------- 1 root root 39K Oct 16 01:40 DeploymentProxyAgent571254X1448071_2017-10-16--01-40-19_14641-2017-10-16--01-40-19.tel -rw------- 1 root root 33K Oct 16 01:42 DeploymentProxyAgent571254X1448071_2017-10-16--01-42-47_15104-2017-10-16--01-42-47.tel -rw------- 1 root root 39K Oct 16 01:43 DeploymentProxyAgent571254X1448071_2017-10-16--01-42-58_15139-2017-10-16--01-42-58.tel -rw-r--r-- 1 root root 31K Oct 16 02:03 DeploymentProxyAgent.err -rw-r--r-- 1 root root 0 Oct 16 01:40 DeploymentProxyAgent.out -rw-r--r-- 1 root root 0 Oct 16 01:40 DeviceKeeperAgent.14644.1508118014.log -rw-r--r-- 1 root root 0 Oct 16 01:42 DeviceKeeperAgent.15107.1508118166.log -rw------- 1 root root 38K Oct 16 01:40 DeviceKeeperAgent571254X1448071_2017-10-16--01-40-15_14644-2017-10-16--01-40-15.tel -rw------- 1 root root 38K Oct 16 01:43 DeviceKeeperAgent571254X1448071_2017-10-16--01-42-48_15107-2017-10-16--01-42-48.tel -rw-r--r-- 1 root root 1.3K Oct 16 01:42 DeviceKeeperAgent.err -rw-r--r-- 1 root root 0 Oct 16 01:40 DeviceKeeperAgent.out -rw-r--r-- 1 root root 0 Oct 16 01:40 DeviceTelemetryAgent.14643.1508118012.log -rw-r--r-- 1 root root 0 Oct 16 01:40 DeviceTelemetryAgent.14674.1508118021.log -rw-r--r-- 1 root root 0 Oct 16 01:42 DeviceTelemetryAgent.15106.1508118165.log -rw-r--r-- 1 root root 0 Oct 16 01:42 DeviceTelemetryAgent.15138.1508118177.log -rw------- 1 root root 33K Oct 16 01:40 DeviceTelemetryAgent571254X1448071_2017-10-16--01-40-15_14643-2017-10-16--01-40-15.tel -rw------- 1 root root 56K Oct 16 01:40 DeviceTelemetryAgent571254X1448071_2017-10-16--01-40-23_14674-2017-10-16--01-40-23.tel -rw------- 1 root root 33K Oct 16 01:42 DeviceTelemetryAgent571254X1448071_2017-10-16--01-42-47_15106-2017-10-16--01-42-47.tel -rw------- 1 root root 56K Oct 16 01:43 DeviceTelemetryAgent571254X1448071_2017-10-16--01-42-59_15138-2017-10-16--01-42-59.tel -rw-r--r-- 1 root root 8.8K Oct 16 01:49 DeviceTelemetryAgent.err -rw-r--r-- 1 root root 0 Oct 16 01:40 DeviceTelemetryAgent.out -rw------- 1 root root 50K Oct 16 01:40 Spawner-571254X1448071_2017-10-16--01-40-08_14635-2017-10-16--01-40-08.tel -rw------- 1 root root 51K Oct 16 01:43 Spawner-571254X1448071_2017-10-16--01-42-36_15096-2017-10-16--01-42-36.tel
- DeviceTelemetryAgent.err - contains all diagnostic output as it relates to device telemetry.
- DeviceKeeperAgent.err - tracks the health of the Apstra agent itself and how it mounts Apstra Graph Datastore to the Apstra Server
- DeploymentProxyAgent.err - This file logs all configuration options managed by Apstra, and describes whether the configuration job is a complete apply or partial apply. All configuration changes by Apstra are seen here.
- CounterProxyAgent.err - captures any custom counter collectors deployed on the switch
- aos.log - Unused
- XXXXXXXXXXXXX-0.err - Unused
The .tel
files are tacc event logs output, used for internal
support at Apstra.
Configuration not Pushed
If configuration did not get pushed, the device agent could be in
'telemetry-only' mode. Check/etc/aos/aos.conf
for
enable_configuration_service = 0
.
You may also observe these log lines in
/var/log/aos/DeviceProxyAgent.err
, stating
Configuration service disabled. Not setting mount
timer.
Correcting this shows handle device deployment config
in the log
file.
2017-10-16 01:42:58,571 15139:INFO:aos.device.common.ProxyDeploymentSm:Device init sanity check completed at 1508118178.571466 2017-10-16 01:42:58,571 15139:INFO:aos.device.common.ProxyDeploymentSm:Device rebooted: False 2017-10-16 01:42:58,572 15139:INFO:aos.device.common.ProxyDeploymentSm:Device undergo ztp: True 2017-10-16 01:42:58,630 15139:INFO:aos.device.common.ProxyDeploymentSm:handle device deployment config: 571254X1448071 ddc: 1 dds: 1 2017-10-16 01:42:58,630 15139:INFO:aos.device.common.ProxyDeploymentSm:deploy ddc: 1 dds: 1 2017-10-16 01:42:58,635 15139:INFO:aos.device.common.ProxyDeploymentSm:Config up-to-date after restart, not re-applying
Can't Ping Apstra or Use Other Network Tools
If you ping the Apstra Server on the CLI while the device is configured in a VRF,
you may get confusing error messages. Make sure to use ping -I
mgmt
on an ICMP echo to source it from the management VRF
properly.
For other commands, use vrf task exec mgmt
.
admin@cumulus:mgmt-vrf:~$ ping aos-server sudo: unable to resolve host mclag-compute-1-leaf1 connect: Network is unreachable admin@cumulus:mgmt-vrf:~$ ping -I mgmt aos-server sudo: unable to resolve host mclag-compute-1-leaf1 ping: Warning: source address might be selected on device other than mgmt. PING aos-server (172.20.85.3) from 172.20.85.6 mgmt: 56(84) bytes of data. 64 bytes from aos-server (172.20.85.3): icmp_seq=1 ttl=64 time=0.245 ms
Avg State is CRITICAL
When the Cumulus VM runs out of RAM, we get kernel panics and the system continues to restart by itself. The cumulus default OVA VM ships with 512MB by default.
The root cause here is there was not enough RAM to run the VM.
Broadcast message from root@cumulus (somewhere) (Mon Jan 15 20:44:51 2018): Avg state is CRITICAL. Last 10 values: [100.0, 100.0, 100.0, 100.0, 100.0, 100. 0, 100.0, 100.0, 100.0, 100.0]. System will shutdown in 26 secs