AOS Device Configuration Lifecycle

The device lifecycle consists of various configuration stages and device states as described below.

Important

A good understanding of the AOS device configuration lifecycle (from the moment it is on-boarded to the moment the device is decommissioned) is essential. Apstra strongly recommends the following content to be understood in full.

Terminology

In AOS, the following terminology is used to identify the various configuration stages:

  • Pristine Config: Consists of pre-existing config plus config added during agent installation. Normally, the pristine config does not change throughout the device’s lifecycle.
  • Discovery 1 Config: Initial basic configuration is added to the device when it is Acknowledged. This includes enabling of LLDP on all interfaces.
  • Discovery 2 Config: Additional basic configuration is added to the device when it is assigned to a blueprint and deploy mode is “Ready”. This includes device hostnames, interface descriptions and port speed / breakout config.
  • Service Config: Additional configuration required by AOS is added when the device’s deploy mode is set to “Deploy”. When referring to Service Config this means the accumulation of Discovery 1 Discovery 2 and this additional config.
  • Rendered Config: Complete AOS rendered configuration for the device, as per the AOS Reference Design.
  • Incremental Config: Staged changes. The configuration that will be applied when the staged changes are committed.
  • Golden Config: When AOS successfully commits a change this is followed with a new collection of the running config. This is called the Golden Config, and serves as Intent: Running configuration is continuously matched against this Golden config. Note that when there has been a Deploy failure, Golden Config is unset.

Configuration stages: Overview

The following table describes the various config events and resulting config and device states and states of a device as it exists within a blueprint:

Event Resulting Device Configuration Resulting AOS Managed Device State AOS blueprint Deployment Mode
New Device Factory Default Configuration N/A Not Assigned
Add Pre-AOS [mgmt] Configuration to device Factory + Pre-AOS N/A Not Assigned
Install AOS Device System Agent Pristine Config: Factory + Pre-AOS + Agent Install config OOS-QUARANTINED Not Assigned
Acknowledge Device Discovery 1: Pristine, plus Interfaces Enabled OOS-READY Not Assigned
Assign Device to blueprint (no deploy) Discovery 2: Discovery 1, plus various basic config IS-READY Ready
Deploy Device Service Config: Discovery 2, plus full AOS Rendered config IS-ACTIVE Deploy
Add/Commit Incremental Configuration Delta of resulting config changes from blueprint modifications IS-ACTIVE Deploy
Drain Device “Drain” Configuration is added IS-READY Drain
Undeploy Device AOS rendered config is removed IS-READY Undeploy
Unassign Device Discovery 1 config is re-applied OOS-READY Not Assigned
_images/AOS_Device_Lifecycle.png

Warning

Any configuration present on the device before agent installation will become part of the Pristine Config and therefore become part of the devices’ entire configuration lifecycle. If a correction is required, this will be service impacting. See below for more detail on Pristine Config.

Configuration stages: Detail

New Device (Factory Default)

The lifecycle of a device begins with the factory default configuration stage.

Add Pre-AOS Config (User-required)

Minimum base configuration, such as for connectivity and agent installation, must be included in the entire config lifecycle. This User-required config can be bootstrapped with Apstra ZTP. The minimum config can also be added with scripts or other methods.

Important

Adding configuration to the device at the pre-AOS stage should be limited to changes required for connectivity or AOS Agent installation, or any config that is known to be required throughout the device’s lifecycle, for example Banners or NTP / SNMP / syslog server IP addresses. Configuration required that is not rendered by AOS can be added using Configlets.

Install Agent (Pristine)

When an AOS device agent is installed on a device (or in the case of an offbox agent, installed server-side) the device connects and registers to AOS in the Quarantined state. A partial config is applied and any “Pre-AOS Config” that has already been added will become part of the Pristine configuration. This pristine configuration is the basis for all subsequent device configuration.

Warning

For Cumulus Linux, enabling configuration services will replace all config in /etc/network/interfaces.

Acknowledge Device (Discovery 1 / Ready)

Acknowledging a device puts it in the Ready state and signals the intent to have AOS manage the device. In this Discovery 1 stage minimal base configuration essential to AOS Agent operation is added to the pristine config. Discovery 1 applies a complete configuration (a.k.a. “Full config push”), overwriting all existing configuration to ensure config integrity.

  • All interfaces are rendered with interface speeds for the assigned Device Profile.
  • All interfaces are no shutdown to allow you to view LLDP neighbor information.
  • All interfaces are moved to L3 mode (default) to prevent the device from participating in the fabric.
  • Cumulus: DHCP relay configuration is removed and wiped.
  • Cumulus: Quagga configuration is removed and wiped.
  • Cumulus: Hostname is learned when agent first starts up, or re-used if agent ran previously. Hostnames can also be learned via DHCP before the device is acknowledged.

Important

Devices that have been acknowledged cannot simply be deleted - as there is still an active agent on the device talking to the AOS server, they would re-appear within seconds. For details on removing a device from AOS, see the device decommissioning guide.

Assign Device (Discovery 2 / Ready)

Assigning a device to a blueprint and setting its Deploy Mode to Ready puts it in the Discovery 2 configuration stage. The device has been staged, but not yet committed (deployed) to the active blueprint. Discovery 2 applies a complete configuration (aka. “Full config push”) to ensure config integrity. The discovery2 configuration brings up network interfaces and configures interface descriptions and validates telemetry, such as LLDP, to ensure it is properly wired and configured. This configuration is non-disruptive to other services in the fabric. Links are up, but they are configured in L3-mode to prevent STP/L2 operations.

  • Hostname is configured per blueprint intent.
  • All interface descriptions are changed per blueprint intent.
  • Interfaces are rendered with blueprint interface speeds.
  • No routing or BGP is configured.
  • No L3 information is configured on interfaces.
  • Fabric MTU is modified for spines to 9050 bytes.
  • Cumulus: Quagga configuration is still empty (etc/quagga/Quagga.conf), but the file is created if it does not exist
  • Cumulus: DHCP configuration is not modified

Deploy Device (Rendered / Active)

Warning

The first time a device is assigned, the Deploy Mode is set to “Deployed” and the blueprint is committed, AOS triggers a full config push for the device, effectively overwriting the complete running config with the Pristine Configuration then adding the full rendered AOS Configuration. Any config that is not part of the AOS rendered config is discarded.

When a device is committed, it becomes Active, and AOS deploys the service configuration, moving the device into the Rendered configuration stage. Rendered config contents are derived from the pristine config, selected reference design/topology, NOS, and device model. The first rendered config applies a complete configuration (removing all existing configuration from the AOS server per Jinja) to ensure configuration integrity. This is the full end-state of AOS. A full configuration has been pushed, all interfaces are running, and routing within IP fabric is configured. Full configuration rendering, intent-based telemetry, and standard service operations occur here.

  • Hostname is configured per blueprint intent.
  • All interface descriptions are changed per blueprint intent.
  • Interfaces are rendered with blueprint interface speeds.
  • Interface VLANs, LAGS, MLAG, VXLAN, etc are managed.
  • All L3 information is rendered.
  • BGP configuration is fully rendered for all BGP peering information.
  • DHCP configuration is configured for any required DHCP relay agents.
  • Cumulus: FRR configuration is fully rendered (/etc/frr/frr.conf), including all BGP peering information.
  • The device is added to the graph database.

After the full configuration is successfully deployed to the Device AOS will take a snapshot of the Device Configuration (e.g. show running-confg) and store it as the Golden Configuration.

Warning

Adding extra configuration to AOS at this time will result in a configuration deviation anomaly, a difference between the current Device Configuration and the stored Golden Configuration. AOS will fail subsequent deployment tasks until the deviation is resolved. Correct the anomaly to proceed.

To see the rendered config file after committing the blueprint, select the device in the Active blueprint and click Config (right-side).

A running configuration can be modified in multiple ways. To modify a config that is not part of the reference design, use Configlets.

Stage Device Update (Incremental / Active)

Staging changes to a running blueprint creates an Incremental configuration. You can preview the incremental config (including Cumulus as of AOS version 3.3.0) before committing the changes to the network. From the staged blueprint, select the device, and click Incremental in the Config section (lower-right). (Config previews for Rendered and Pristine (as of AOS version 3.2.1) are also accessible from here). When the changes have been committed, the Incremental Config is empty.

_images/config_view_321.png

Commit Device Again (Rendered-Updated / Active)

Whenever a change is committed to a blueprint that affects the device’s configuration, a partial config updates the rendered config.

Configuration Deviations

After each successful config deploy AOS collects the running config and stores it internally as the Golden configuration. Intent is the cornerstone of AOS. As such, any difference between the actual running config and this Golden config results in a Config Deviation Anomaly on the blueprint’s Dashboard. The Golden config is updated every time config is successfully applied to a device.

Some important points to remember:

  • Golden Config is updated upon each successful configuration deployment
  • When config deployment fails for some reason, Golden config is not set. This means both a config deviation and deployment failure anomaly is raised.
  • Running configuration telemetry is continuously collected and matched against the Golden config. Any difference results in a Deviation anomaly.
  • Configuration Anomalies can be ‘suppressed’ using the “Accept Changes feature”. This does NOT mean the change is added to Golden config or Intent.

See Configuration Deviation for details.

Device Offline (Unavailable)

A managed device (one that has been acknowledged) that is not connected to the AOS server is in the unavailable state. A device could be offline if the device agent interface is offline, if the service is not running, or if a network connectivity error occurs.

Manually Applying Full Config

The Discovery 1 and Deploy Device configuration stages initiate full config pushes. In rare cases, you may need to manually apply a full config push. For example, if the required config is not in place for a blueprint with NX-OS devices that require TCAM carving, the device config will fail. The TCAM config error must be corrected, followed by manually pushing a full config.

Important

A full configuration push should be done with the utmost caution, as it is very likely to impact all services running on the box. Exact impact depends on changes being pushed. Also note all Out of Band changes are overwritten upon a full push.

Deploy Modes

Managed devices in blueprints can be in one of several modes. See Changing Deploy Mode on One Device for steps for changing deploy modes.

Not Set
The initial state of a device. The device is not active in the fabric.
Deploy
A deployed device is an active device in the fabric.
Ready
When a device is assigned to a blueprint it is in ready mode; discovery 2 configuration is added (hostnames, interface descriptions, port speed / breakout configuration). It is not yet active in the fabric. Changing from deploy to ready removes AOS-rendered configuration.
Drain

Draining a device for physical maintenance enables it to be taken out of service gracefully without impacting existing TCP flows. Depending on the device being drained, AOS uses one of two methods:

For L2 Servers

  • MLAG peer-links port channels and bond interfaces on any NOS are not changed.
  • For Arista EOS, Cisco NX-OS, all interfaces towards L2 servers in the blueprint are shutdown.
  • For Cumulus, all bond interfaces towards L2 servers in the blueprint are deleted and member ports are removed from /etc/network/interfaces. As a result LAG/MLAG anomalies are generated.

For Network L3 Switches

Use Inbound/Outbound route-maps ‘deny’ statements to block any advertisements to 0.0.0.0/0 le 32.

This allows existing L3 TCP flows to continue without interruption. After a second or two, the TCP sessions should be re-established by the src/dst devices, or they should negotiate a new TCP port. The new TCP port forces the devices to be hashed onto a new ECMP path from the list of available links. Since no ECMP routes to the destination are available in the presence of a route map, the traffic does not flow through the device that is in maintenance mode. The device is effectively drained of traffic and can be removed from the fabric (by changing Deploy mode to Undeploy).

While TCP sessions drain (which could take some time, especially for EVPN blueprints) BGP anomalies are expected. When configuration deployment is complete, the temporary anomalies are resolved. See the device draining guide for more information.

Undeploy
Undeploying a device removes the complete service configuration. If a device is carrying traffic it is best to put it in drain mode first (and commit the change) before undeploying the device.