Telemetry

Telemetry Overview

Telemetry in the Web Interface

To go to a summary of the number of devices running the different telemetry services from the web interface, navigate to Devices > Telemetry. To go to collection statistics for a specific service for all relevant devices, click a service name.

_images/telemetry_40.png

From the collection statistics screen, you can see if there are any service errors that were generated during the telemetry collection process (in the Error message column). Click the Show error link to see its details.

From this screen you can also go to to all telemetry services for a specific device by clicking the device name.

_images/telemetry_service_40.png

To go to collection statistics for that device for all services, click Collection Statistics.

_images/telemetry_device_collection_40.png

Telemetry Services

Telemetry services include the following:

ARP
ARP telemetry shows an ARP table. This information can be queried via API. No anomalies are generated.
BGP
BGP telemetry shows role(s), VRF name, address family, source and destination information, expected and actual states, intent status, last fetched/modified, and (as of version 3.3.0) BGP peer state.
Config

Devices with deviations between the rendered discovered/service config and the actual config are flagged with a config deviation error. When configuration changes are made outside of Apstra management, alarms are generated immediately. The risk with a configuration deviation is that it is possible for Apstra to overwrite the deviated configuration with a configuration re-write.

The correct way to deal with a config deviation alarm is to understand the configuration change being made, and consider setting it up as a configlet instead.

Counters
Counter telemetry provides information about interface in/out packets, interface errors, statistics, and so on. This feature is consumed by other advanced downstream features like telemetry streaming. No anomalies are generated.
Hostname
When you assign a device with deploy mode Ready to a blueprint, the device enters the Discovery 2 Config stage. Hostname telemetry is collected that validates the device hostname against intent. Mismatches result in anomalies.
Interface
When you assign a device with deploy mode Ready to a blueprint, the device enters the Discovery 2 Config stage. Interface telemetry is collected that compares intent with the up/down state of physical interfaces. It does not include LLDP, LAG or any other attachment information.
LAG
LAG telemetry shows the health of all the LACP bonds facing servers and between MLAG switches.
LLDP (Cabling)
When you assign a device with deploy mode Ready to a blueprint, the device enters the Discovery 2 Config stage. Every node is part of intent. On each link, there are expected neighbor hostnames, interfaces and connections. Physical cabling and links must match the specified intent. Any deviations result in anomalies that must be corrected by either recabling to match the blueprint or by modifying the blueprint to match cabling already in place.
Apstra knows what should be connected on each link, what its neighbor
hostname should be, and what its neighbor interface should be.
MAC
MAC Address-table telemetry shows which MAC addresses appear on which interfaces, and which VLANs.
MLAG

MLAG telemetry tracks the health status of the MLAG domain itself - the control-protocols required between two leaf switches communicating with each other properly for the MLAG domain state. Implementation detail differences exist between multiple vendors, but the intent is the same - the switches should be healthy among each other. MLAG telemetry is only available for L2 blueprints that have at least one virtual network assigned in an MLAG pair.

If an MLAG-attached server is not fully connected, the state changes from ‘active_full’ to ‘active_partial’.

Note

Cisco MLAG (VPC) commands cannot derive the status of the LAG on the VPC peer switch. Accordingly, the state dual-active cannot actually gather the command. This is a limitation from Cisco.

Route
Routing telemetry analyzes the routing table on every managed spine and leaf. Since the entire IP fabric is managed, you can derive and predict full IP table information from the network topology. Deviations in the network routing telemetry (for example, a missing next-hop IP address for a default route) cause an alarm.
Transceivers
Transceiver telemetry gives the network operator statistics on optical interfaces, showing DOM statistics, light levels, lossy interfaces, and other optical statistics. No anomalies are generated.
Utilization
Utilization telemetry allows the network operator to view some vital statistics on the device - CPU and Memory utilization. No anomalies are generated.

Telemetry Collection Statistics

Telemetry collection statistics include the following details:

Device
The device key
Service Started?
Has the service started?
Interval
How frequently the service is configured to run on the device (in seconds)
Input
The input that is provided to the service for its processing
Run Count
The number of times the collector is scheduled to run
Success Count
The number of times the collector successfully executed
Failure Count
The number of times the collector failed execution
Max Run Count
User-specified maximum number of times for the collector to run
Execution Time
The time it took for collection during the last iteration (in milliseconds)
Waiting Time
A device runs multiple collectors. If some collectors monopolize CPU, other collector executions are deferred. Waiting time is the amount of time that the collector was deferred (in milliseconds).
Last Run Timestamp
Timestamp at which the collector was scheduled to run
Last Error Timestamp
Timestamp at which the collector last reported an error
Error message
Error message from the last collector iteration.

L2 Server Telemetry

You can collect telemetry data from L2 servers with onbox agents. For agent installation instructions, see Agents.

The following telemetry data is available on L2 servers by default. Data is collected using Subprocess over ssh. This is Unintended data, meaning since the data is not matched against Intent, anomalies are not raised for these data.

LLDP
Displays LLDP neighbor information
CPU
Collects five-second average CPU utilization every five seconds for individual Apstra processes as well as for the total
RAM
Collects five-second average memory footprint every five seconds for individual Apstra processes as well as for the total
Interface Status
Lists Interface names and status

With IBA Probes, you can vastly expand on the collected data and define exact conditions in which anomalies are raised.

External Router Telemetry

Apstra includes telemetry expectations for external routing - a default route is expected to be received by each external router in the blueprint. These routes are load-balanced with ECMP, and each switch (and server, if L3) expects to receive multiple routes.

Telemetry Streaming

The term telemetry streaming refers to the Apstra server transmitting the following content to user-defined end-hosts so that you can further process the data and use them within your own internal systems:

Counter Data
Performance Monitoring (PM) data is time-series numerical values such as interface counters, CPU memory utilization, and CPU usage. This information is typically stored and graphed for visual analysis. Typical tools used for this purpose include Graphite and Cacti.
Event Data
Event data is a collection of status information that you may need to refer back to in order to troubleshoot your network. The best reference for example event data is syslog. You need a general amount of event history so that you can perform troubleshooting activities over a period of time. While this is an undefined amount of time, you generally want as much time as possible, because you don’t get to troubleshoot a problem the instant that it occurs.
Alert Data
Alert data is a collection of information that requires your attention to resolve an issue. In the best cases, alerts tell you what is wrong relative to the network service, and provide the necessary data to allow you to identify root-cause and resolve the issue as fast as possible.

The data streams themselves are implemented with Google Protocol Buffers (GPB). The format of the data streams is defined and also implemented using GPBs. GPBs allow software developers to use a language-agnostic definition of events and data types.

GPB offers support for C++, Python, Go, and possibly more languages in the future. Apstra has example Python code named AOSOM Streaming that is available for Google Protocol Buffers. The AOSOM Streaming demo software is open source and can be downloaded from github: https://github.com/Apstra/aosom-streaming.

The developer has options of different languages: C++, Python, and Go are all available. This means it integrates nicely with our C++ infrastructure. And then Infrastructure Engineers can use Python or Go for the client.

Route Anomalies for a Host - Example

HTTP GET https://aos-server/api/blueprints/{blueprint_id}/anomalies (output has been truncated to only show example of one missing route. Actual GET response will return entire routing table)

{
  "items": [
    {
      "actual": {
        "value": "missing"
      },
      "anomaly_type": "route",
      "expected": {
        "value": "up"
      },
      "id": "547bcbc9-963f-4477-904b-712482aa6428",
      "identity": {
        "anomaly_type": "route",
        "destination_ip": "0.0.0.0/0",
        "system_id": "000C29202526"
      },
      "last_modified_at": "2017-06-09T17:28:13.773324Z",
      "role": "unknown",
      "severity": "critical"
    },
    {
      "actual": {
        "value": "partial"
      },
      "anomaly_type": "route",
      "expected": {
        "value": "up"
      },
      "id": "92a6804a-42ff-4cbd-a52b-5c6acadc1d23",
      "identity": {
        "anomaly_type": "route",
        "destination_ip": "0.0.0.0/0",
        "system_id": "000C29EA59A7"
      },
      "last_modified_at": "2017-06-09T17:28:44.787604Z",
      "role": "unknown",
      "severity": "critical"
    },
    {
      "actual": {
        "value": "partial"
      },
      "anomaly_type": "route",
      "expected": {
        "value": "up"
      },
      "id": "25886eb7-e629-4f56-9479-686fe1e53c64",
      "identity": {
        "anomaly_type": "route",
        "destination_ip": "0.0.0.0/0",
        "system_id": "000C29E808A1"
      },
      "last_modified_at": "2017-06-09T17:28:13.773423Z",
      "role": "unknown",
      "severity": "critical"
    },
    {
      "actual": {
        "value": "partial"
      },
      "anomaly_type": "route",
      "expected": {
        "value": "up"
      },
      "id": "2b7a77ac-fd12-41fe-acfc-a53678b177ed",
      "identity": {
        "anomaly_type": "route",
        "destination_ip": "0.0.0.0/0",
        "system_id": "000C2982786A"
      },
      "last_modified_at": "2017-06-09T17:28:13.773389Z",
      "role": "unknown",
      "severity": "critical"
    },
    {
      "actual": {
        "value": "partial"
      },
      "anomaly_type": "route",
      "expected": {
        "value": "up"
      },
      "id": "50a1e0d6-e483-4bc4-bed8-cbc5666569f8",
      "identity": {
        "anomaly_type": "route",
        "destination_ip": "0.0.0.0/0",
        "system_id": "000C2998C7E7"
      },
      "last_modified_at": "2017-06-09T17:28:13.773453Z",
      "role": "unknown",
      "severity": "critical"
    },
    {
      "actual": {
        "value": "down"
      },
      "anomaly_type": "bgp",
      "expected": {
        "value": "up"
      },
      "id": "ab9f4273-e86f-456c-8cc7-7115f3aafa45",
      "identity": {
        "anomaly_type": "bgp",
        "destination_asn": "1",
        "destination_ip": "1.1.1.1",
        "source_asn": "65417",
        "source_ip": "10.0.0.5",
        "system_id": "000C29202526"
      },
      "last_modified_at": "2017-06-09T17:28:13.727949Z",
      "role": "to_external_router",
      "severity": "critical"
    }
  ],
  "count": 6
}

Telemetry Command Reference

The purpose of this section is to assist network administrators in understanding why telemetry alarms exist, and how they are generated. This is not an exhaustive list of interface commands.

Cisco

Cisco telemetry is derived from the NX-API with ‘show’ commands and embedded event manager applets that provide context data to the device agent while it is running. Most commands are run as their CLI version wrapped into JSON output.

Cisco Telemetry
Interface counters
show interface counters | json
Interface error counters
show interface counters errors | json
Interface status
show interface status | json
LLDP neighbors
show lldp neighbors detail | json
BGP Sessions
show bgp session | json
Hostname
show hostname | json and show hosts | json
ARP
show ip arp vrf default | json
MAC Table
show mac address-table | json
Routing table
show ip route | json
Port-channel
show port-channel summary | json
MLAG
show vpc | json

Arista

Arista EOS uses a few techniques from the EOS SDK API to directly subscribe to event notifications from the switch, for example ‘interface down’ or ‘new route’ notifications. When using an event-based notification, you do not have to continually render ‘show’ commands every few seconds. The EOS SDK gives you the information immediately as soon as the switch has the status.

Warning

Event-based subscription requires the EOSProxySDK agent. For details please refer to Arista Device Agent.

When the Arista API does not provide information (LLDP statistics), Apstra runs CLI commands at a regular interval to derive telemetry expectations.

Arista telemetry commands
Interface counters
show interface counters
Interface error counters
show interfaces counters errors
Interface status
show interfaces status
LLDP neighbors
show lldp neighbors detail
BGP Sessions
show ip bgp summary
Hostname
show hostname
ARP
ARP collection is done using an event-monitor for performance. show event-monitor arp and show ip arp
MAC Table
MAC address collection is done using an event-monitor for performance. show event-monitor mac and show mac address-table
Routing table
show ip route
Port-channel
show port-channel summary
MLAG
show mlag and show mlag interfaces

Cumulus

Cumulus switches use a combination of the Cumulus netshow command and standard Linux sockets.

Cumulus Telemetry commands
Interface counters
ethtool -m
Interface error counters
ethtool -m
Interface status
Interface status is collected using the netlink api (AF_INET)
LLDP neighbors
lldpctl -f json
BGP Sessions
vtysh -c 'show ip bgp summary json'
Hostname
hostname
ARP
ip -4 neigh
MAC Table
MAC address collection is done using an event-monitor for performance. show event-monitor mac and show mac address-table
Routing table
show ip route and the AF_INET linux socket
Port-channel
netshow bondmems --json
MLAG
clagctl -j

Linux Servers

Linux Servers use simple CLI commands and standard Linux sockets for most of the telemetry collection.

Interface counters
ethtool -m
Interface error counters
ethtool -m
Interface status
Interface status is collected using the netlink api (AF_INET)
LLDP neighbors
lldpctl -f xml
BGP Sessions
vtysh -c 'show ip bgp summary json'
Hostname
hostname
ARP
ip -4 neigh
MAC Table
brctl showmacs
Routing table
show ip route and the AF_INET linux socket
Port-channel
netshow bondmems --json
MLAG
clagctl -j

Extensible Telemetry

To collect additional telemetry, you need Apstra device drivers and telemetry collectors. You can use collected telemetry information in IBA probes.

AOS Device Drivers

AOS device drivers enable Apstra to connect to a Device Operating System (DOS) and collect telemetry. Apstra ships with drivers for EOS, Cumulus, NX-OS, Ubuntu, and CentOS. To add a driver for an operating system not listed here, contact Juniper JTAC Apstra Support.

Telemetry Collectors

Telemetry collectors are Python modules that help collect extended telemetry information. The following sections describe the pipeline for creating telemetry collectors and extending Apstra with the new collectors. You are expected to be familiar with Python for collector development.

Setting Up The Development Environment

Please contact Juniper JTAC Apstra Support for access to the telemetry collectors, which are housed in the aos_developer_sdk repository. Please contribute new collectors to the repository.

To keep your system environment intact, we recommend that you use a virtual environment to isolate the required Python packages (for development and testing). You can download the base development environment, aos_developer_sdk.run, from https://support.juniper.net/support/downloads/?p=apstra-fabric-conductor. To load the environment, execute:

aos_developer_sdk$ bash aos_development_sdk.run
4d8bbfb90ba8: Loading layer [==================================================>]  217.6kB/217.6kB
7d54ea05a373: Loading layer [==================================================>]  4.096kB/4.096kB
e2e40f457231: Loading layer [==================================================>]  1.771MB/1.771MB
Loaded image: aos-developer-sdk:2.3.1-129

================================================================================
Loaded AOS Developer SDK Environment Container Image
aos-developer-sdk:2.3.1-129.

Container can be run by
    docker run -it \
        -v <path to aos developer_sdk cloned repo>:/aos_developer_sdk \
        --name <container name> \
        aos-developer-sdk:2.3.1-129

================================================================================

This command loads the aos_developer_sdk Docker image. After the image load is complete, the command to start the environment is printed. Start the container environment as specified by the command. To install the dependencies, execute:

root@f2ece48bb2f1:/# cd /aos_developer_sdk/
root@f2ece48bb2f1:/aos_developer_sdk# make setup_env
...

The environment is now setup for developing and testing the collectors. Apstra SDK packages, such as device drivers and REST client, are also installed in the environment.

Developing a Telemetry Collector

To develop a telemetry collector, specify the following in order.

  1. Service for which the collector is developed

    Identify what the service is. For example, the service could be to collect received and transmitted bytes from the switch interfaces. Identify a name for the service. Using service names that are reserved for built-in services (ARP, BGP, interface, hostname, route, MAC, XCVR, LAG, MLAG) is prohibited.

  2. The schema of the data provided to Apstra

    Identify how the collector output is to be structured. A collection of key-value pairs should be posted to Apstra. Identify what each item is, that is, what is the key/value syntactically and semantically. For the above mentioned example, key is a string that identifies the interface name. The value is a JSON string, with the JSON having two keys ‘rx’ and ‘tx’ both having an integer value.

  3. Device Operating System (DOS) for which the collector is developed

    The collector plugins are DOS-specific. Before writing a collector, identify the DOS(s) for which collector(s) are required.

  4. How the required data can be obtained from the device

    Identify the commands that can be used in the device to retrieve the required information. For example, ‘show interfaces’ command gives received and transmitted bytes from an Arista EOS device.

  5. Storage Schema Path

    The storage schema path is determined by the type of key and value in each item. The type of collector selected determines the storage schema for the application. The storage schema defines the high level structure of the data returned by the service. The storage schema path for your collector can be determined using the following table

    Determining Storage Schema Path
    Key Type Value Type Storage Schema Path
    String String aos.sdk.telemetry.schemas.generic
    String Dict aos.sdk.telemetry.schemas.generic
    Dict String aos.sdk.telemetry.schemas.iba_string_data
    Dict Integer aos.sdk.telemetry.schemas.iba_integer_data
  6. Application Schema

    Application schema defines the schema for each item posted to the framework. Application schema is expressed using draft 4 version of json schema. Each item is comprised of a key and value. The following table specifies two sample items.

    Sample item with its storage schema path
    Storage Schema Path Sample Item
    aos.sdk.telemetry.schemas.generic
    {
        "identity": "eth0",
        "value": "up",
    }
    
    aos.sdk.telemetry.schemas.iba_string_data
    {
        "key": {
            "source_ip": "1.1.1.1",
            "dest_ip": "1.1.1.2",
        },
        "value": "up",
    }
    

    Note

    • An item returned by collectors with generic storage schema should specify the key value using the key ‘identity’ and the value using the key ‘value’.
    • An item returned by collectors with IBA-based schemas should specify the key value using the key ‘key’ and the value using the key ‘value’.

    Using this information, you can write the JSON schema. The following table maps the sample item specified above to its corresponding JSON schema.

    Sample Application Schema
    Sample Item Application Schema
    {
        "identity": "eth0",
        "value": "up",
    }
    
    {
        "type": "object",
        "properties": {
            "identity": {
                "type": "string",
            },
            "value": {
                "type": "string",
            }
        }
    }
    
    {
        "key": {
            "source_ip": "1.1.1.1",
            "dest_ip": "1.1.1.2",
        },
        "value": "up",
    }
    
    {
        "type": "object",
        "properties": {
            "key": {
                "type": "object",
                "properties": {
                    "source_ip": {
                        "type": "string",
                        "format": "ipv4"
                    },
                    "dest_ip": {
                        "type": "string",
                        "format": "ipv4"
                    },
                    "required": ["source_ip", "dest_ip"],
                }
            },
            "value": {
                "type": "string",
            }
        }
    }
    

    You can specify more complex schema using the constructs available in JSON schema. Update the schema in the file aos_developer_sdk/aosstdcollectors/aosstdcollectors/json_schemas/<service_name>.json

Writing A Collector

Collector is a class that must derive from aos.sdk.system_agent.base_telemetry_collector.BaseTelemetryCollector. Override the collect method of the collector with the logic to:

Collect the data from the device

A device driver instance is available inside the collector. The device driver provides methods to execute commands against the devices. For example, most Apstra device drivers provide methods get_json and get_text to execute commands and return the output.

Note

The device drivers for aos_developer_sdk environment are already installed. The methods available to collect the data can be explored. For example:

>>> from aos.sdk.driver.eos import Device
>>> device = Device('172.20.180.10', 'admin', 'admin')
>>> device.open()
>>> pprint.pprint(device.get_json('show version'))
{u'architecture': u'i386',
 u'bootupTimestamp': 1548302664.0,
 u'hardwareRevision': u'',
 u'internalBuildId': u'68f3ae78-65cb-4ed3-8675-0ff2219bf118',
 u'internalVersion': u'4.20.10M-10040268.42010M',
 u'isIntlVersion': False,
 u'memFree': 3003648,
 u'memTotal': 4011060,
 u'modelName': u'vEOS',
 u'serialNumber': u'',
 u'systemMacAddress': u'52:54:00:ce:87:37',
 u'uptime': 62620.55,
 u'version': u'4.20.10M'}
>>> dir(device)
['AOS_VERSION_FILE', '__class__', '__delattr__', '__dict__', '__doc__',
'__format__', '__getattribute__', '__hash__', '__init__', '__module__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'close',
'device_info', 'driver', 'execute', 'get_aos_server_ip',
'get_aos_version_related_info', 'get_device_aos_version',
'get_device_aos_version_number', 'get_device_info', 'get_json',
'get_text', 'ip_address', 'onbox', 'open', 'open_options', 'password',
'probe', 'set_device_info', 'upload_file', 'username']
Parse the data

The collected data needs to be parsed and re-formatted as per the Apstra framework and the service schema identified above. Collectors with generic storage schema follow the following structure:

{
    "items": [
        {
            "identity": <key goes here>,
            "value": <value goes here>,
        },
        {
            "identity": <key goes here>,
            "value": <value goes here>,
        },
        ...
    ]
}

Collectors with IBA-based schema follow the following structure:

[
    {
        "key": <key goes here>,
        "value": <value goes here>,
    },
    {
        "key": <key goes here>,
        "value": <value goes here>,
    },
    ...
]

In the structures above, the data posted has multiple items. Each item has a key and a value. For example, to post interface specific information, there would be an identity/key-value pair for each interface you want to post to the framework.

Note

In the case when you want to use a third party package to parse data obtained from a device, list the Python package and version in the path.

<aos_developer_sdk>/aosstdcollectors/requirements_<DOS>.txt. The packages installed by the dependency do not conflict with packages used by Apstra. The Apstra-installed packages are available at /etc/aos/python_dependency.txt in the development environment.

Post the data to the framework
When the data is collected and parsed as per the required schema, post the data to the framework. You can use the post_data method that is available in the collector. It accepts one argument, and that is the data that should be posted to the framework.

The folder aos_developer_sdk/aosstdcollectors/aosstdcollectors in the repository contains folders for each DOS. Add your collector to the folder that matches the DOS. For example, to write a collector for Cumulus, add the collector to aos_developer_sdk/aosstdcollectors/aosstdcollectors/cumulus, and name the file after the service name. For example, if the service name is interface_in_out_bytes, then name the file interface_in_out_bytes.py.

In addition to defining the collector class, define the function collector_plugin in the collector file. The function takes one argument and returns the collector class that is implemented.

For example, a generic storage schema based collector looks like:

"""
    Service Name: interface_in_out_bytes
    Schema:
        Key: String, represents interface name.
        Value: Json String with two possible keys:
            rx: integer value, represents received bytes.
            tx: integer value, represents transmitted bytes.
    DOS: eos
    Data collected using command: 'show interfaces'
    Type of Collector: BaseTelemetryCollector
    Storage Schema Path: aos.sdk.telemetry.schemas.generic
    Application Schema: {
        'type': 'object',
        'properties': {
            'identity': {
                'type': 'string',
            },
            'value': {
                'type': 'object',
                'properties': {
                    'rx': {
                        'type': 'number',
                    },
                    'tx': {
                        'type': 'number',
                    }
                },
                'required': ['rx', 'tx'],
            }
        }
    }

"""
import json
from aos.sdk.system_agent.base_telemetry_collector import BaseTelemetryCollector


# Inheriting from BaseTelemetryCollector
class InterfaceRxTxCollector(BaseTelemetryCollector):

    # Overriding collect method
    def collect(self):

        # Obtaining the command output using the device instance.
        collected_data = self.device.get_json('show interfaces')

        # Data is in the format
        # "interfaces": {
        #     "<interface_name>": {
        #         ....
        #         "interfaceCounters": {
        #         ....
        #         "inOctets": int
        #         "outOctets": int
        #         ....
        #         }
        #     }
        #     ...
        # }

        # Parse the data as per the schema and structure required.
        parsed_data = json.dumps({
            'items': [
                {
                    'identity': intf_name,
                    'value': json.dumps({
                        'rx': intf_stats['interfaceCounters'].get('inOctets'),
                        'tx': intf_stats['interfaceCounters'].get('outOctets'),
                    })
                } for intf_name, intf_stats in collected_data['interfaces'].iteritems()
                if 'interfaceCounters' in intf_stats
            ]
        })

        # Post the data to the framework
        self.post_data(parsed_data)


# Define collector_plugin class to return the Collector
def collector_plugin(_device):
    return InterfaceRxTxCollector

An IBA storage schema based collector looks like:

"""
    Service Name: iba_bgp
    Schema:
        Key: JSON String, specifies local IP and peer IP.
        Value: String. ‘1’ if state is established ‘2’ otherwise
    DOS: eos
    Data collected using command: 'show ip bgp summary vrf all'
    Storage Schema Path: aos.sdk.telemetry.schemas.iba_string_data
    Application Schema: {
        'type': 'object',
        'properties': {
            key: {
                'type': 'object',
                'properties': {
                    'local_ip': {
                        'type': 'string',
                    },
                    'peer_ip': {
                        'type': 'string',
                    }
                },
                'required': ['local_ip', 'peer_ip'],
            },
            'value': {
                'type': 'string',
            }
        }
    }
"""

from aos.sdk.system_agent.base_telemetry_collector import IBATelemetryCollector

def parse_text_output(collected):
    result = [
        {'key': {'local_ip': str(vrf_info['routerId']), 'peer_ip': str(peer_ip)},
         'value': str(
             1 if session_info['peerState'] == 'Established' else 2)}
        for vrf_info in collected['vrfs'].itervalues()
        for peer_ip, session_info in vrf_info['peers'].iteritems()]
    return result

# Inheriting from BaseTelemetryCollector
class IbaBgpCollector(BaseTelemetryCollector):
    # Overriding collect method
    def collect(self):
        # Obtaining the command output using the device instance.
        collected_data = self.device.get_json('show ip bgp summary vrf all')
    # Parse the data as per the schema and structure required and
    # post to framework.
        self.post_data(parse_text_output(collected_data))

# Define collector_plugin class to return the Collector
def collector_plugin(device):
    return IbaBgpCollector

Unit Testing The Collector

The folder aos_developer_sdk/aosstdcollectors/test in the repository contains folders based on the DOS. Add your test to the folder that matches the DOS. For example, a test to a collector for Cumulus is added to aos_developer_sdk/aosstdcollectors/test/cumulus. We recommend that you name the unit test with the prefix test_.

The existing infrastructure implements a Pytest fixture collector_factory that is used to mock the device driver command response. The general flow for test development is as follows.

  1. Use the collector factory to get a collector instance and mocked Apstra framework. The collector factory takes the collector class that you have written as input.
  2. Mock the device response.
  3. Invoke collect method.
  4. Validate the data posted to the mocked Apstra framework.

For example, a test looks like:

import json
from aosstdcollectors.eos.interface_in_out_bytes import InterfaceRxTxCollector


# Test method with prefix 'test_'
def test_sanity(collector_factory):

    # Using collector factory to retrieve the collector instance and mocked AOS
    # framework.
    collector, mock_framework = collector_factory(InterfaceRxTxCollector)

    command_response = {
        'interfaces': {
            'Ethernet1': {
                'interfaceCounters': {
                    'inOctets': 10,
                    'outOctets': 20,
                }
            },
            'Ethernet2': {
                'interfaceCounters': {
                    'inOctets': 30,
                    'outOctets': 40,
                }
            }
        }
    }
    # Set the device get_json method to retrieve the command response.
    collector.device.get_json.side_effect = lambda _: command_response

    # Invoke the collect method
    collector.collect()

    expected_data = [
        {
            'identity': 'Ethernet1',
            'value': json.dumps({
                'rx': 10,
                'tx': 20,
            }),
        },
        {
            'identity': 'Ethernet2',
            'value': json.dumps({
                'rx': 30,
                'tx': 40,
            })
        }
    ]
    # validate the data posted by the collector
    data_posted_by_collector = json.loads(mock_framework.post_data.call_args[0][0])
    assert sorted(expected_data) == sorted(data_posted_by_collector["items"])

To run the test, execute:

root@1df9bf89aeaf:/aos_developer_sdk# make test

This command executes all the tests in the repository.

Packaging A Collector

All the collectors are packaged based on the DOS. To generate all packages, execute make at aos_develop_sdk. The build packages can be found at aos_developer_sdk/dist. The packages build can be broadly classified as:

Built-In Collector Packages
These packages have the prefix aosstdcollectors_builtin_. To collect telemetry from a device per the reference design, Apstra requires services as listed in the Device Telemetry section. Built-In collector packages contain collectors for these services. The packages are generated on a per DOS basis.
Custom Collector Packages
These package have the prefix aosstdcollectors_custom_ in their names. The packages are generated on a per DOS basis. The package named aosstdcollectors_custom_<DOS>-0.1.0-py2-none-any.whl contains the developed collector.
AOS SDK Device Driver Packages
These packages have a prefix apstra_devicedriver_. These packages are generated on a per DOS basis. Packages are generated for DOS that are not available by default in Apstra.

Uploading Packages

If Apstra did not ship with the built-in collector packages and the AOS SDK Device Driver for your Device Operating System (DOS), you must upload them to Apstra.

If you are using an offbox solution and your DOS is not EOS or Cumulus, you must upload the built-in collector package.

Upload the package containing your collector(s) and assign them to a System Agent or System Agent Profile.

Using The Telemetry Collector

To use the collector, set up the telemetry service registry.

Telemetry Service Registry

The registry maps the service to its application schema and the storage schema path. You can manage the telemetry service registry with the REST endpoint /api/telemetry-service-registry. The collector for a service cannot be enabled without adding a registry entry for the particular service. The registry entry for a service cannot be modified while the service is in use.

Note

When executing make, all application schemas are packaged together to a tar file (json_schemas.tgz) in the dist folder. With AOS CLI, you have the option of importing all the schemas in the .tgz file.

Starting A Collector

You can start a service using the POST API /api/systems/<system_id>/services with the following three arguments:

Input_data
The data provided as input to the collector. Defaults to None.
Interval
Interval at which to run the service. Defaults to 120 seconds.
Name
Name of the service.

Note

You can also manage collectors via AOS CLI.

Deleting a Collector
You can delete a service with the DELETE API /api/systems/<system_id>/services/<service_name>.
Getting the Collected Data
You can retrieve collected data with the GET API /api/systems/<system_id>/services/<service_name>/data. Only the data collected in the last iteration is saved. Data does not persist over Apstra restart.
List all Running Collector Services
You can retrieve the list of services enabled on a device with the GET API /api/systems/<system_id>/services.

Debugging Telemetry

Enable trace options to debug telemetry output. On the Device Agent, in /etc/aos.conf (usually), set these options and restart the agent.

[DeviceTelemetryAgent]
log_config = aos.infra.core.entity_util:DEBUG,aos.device.DeviceTelemetryAgent:DEBUG
trace_config = MountFacility/0-8,DHT,AgentHeartbeat,TelemetryProxy

Log files containing trace information for telemetry agents will then be viewable in /var/log/aos/DeviceTelemetryAgent.<pid>.<timestamp>.log. These log files are verbose, but they may point to various rendering and parsing issues in the environment. When you finish troubleshooting, be sure to disable logging.