Guidelines for Aggregating Junos Telemetry Data

One important feature of the Junos Telemetry is that data processing occurs at the collector that streams data, rather than the device. Data is not automatically aggregated, but it can be aggregated for analysis.

Data aggregation is useful in the following scenarios:

Data for the same metric over fixed spans of time, such as, the average number of physical interface ingress errors over a 30-second interval.
Data from different sources (such as multiple line cards) for the same metric, such as label-switched path (LSP) statistics or filter counter statistics.
Data from multiple sources, such as input and output statistics for aggregated Ethernet interfaces.

The following sections describe how to perform data aggregation for various scenarios. The examples in these sections use the InfluxDB time-series database to accept queries on telemetry data. InfluxDB is an open source database written in the Go programming language that is designed specifically to handle time-series data.

Aggregating Data Over Fixed Time Span

Aggregating data for the same metric over a fixed time span is a common and useful way to detect trends. Metrics can include gauges, that is, single values, or cumulative counters. You might also want to aggregate data continuously.

Example: Aggregating Data for Gauge Metrics
Example: Aggregating Data for Cumulative Statistics

Example: Aggregating Data for Gauge Metrics
Example: Aggregating Data for Cumulative Statistics

Example: Aggregating Data for Gauge Metrics

In this example, data for JuniperNetworksSensors.jnpr_interface_ext.interface_stats.egress_queue_info.current_buffer_occupancy from port.proto is written to the InfluxDB database with tags that identify the host name, an interface name and corresponding queue number and measurement called current_buffer_occupancy . See Table 1 for the specific values used in this example.

Table 1: Telemetry Data Values
Time Stamp (seconds)	Value	Tags
1458704133	1547	queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’
1458704143	3221	queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’
1458704155	4860	queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’
1458704166	6550	queue_number=0,interface_name=’xe-1/0/0’,host=’sjc-a’

Each measurement data point has a timestamp and recorded value. In this example, the tag queue_number is the numerical identifier of the interface queue.

To aggregate this data over 30-second intervals, use the following influxDB query:

For $time_start and $time_end, specify the actual range of time.

Example: Aggregating Data for Cumulative Statistics

Some Junos telemetry interface sensors report cumulative counter values, such as the number of ingress packets, defined as JuniperNetworksSensors.jnpr_interface_ext.interface_stats.ingress_stats.packets.

It is common to derive traffic rates from packet or byte counters. Unlike with gauge metrics, the initial data point in the series for cumulative counters is used only to set the baseline.

Use the following guidelines to create a database query for cumulative statistics:

Calculate the cumulative value for a specific time interval. You can calculate either an average among several data points recorded during the time interval, or you can interpolate a value. All data points should belong to the same series. If a counter reset has occurred between the two data points reported at different times, do not use both data points.
Determine the appropriate value for the previous time interval. If a counter is reset since the last update, declare that value as unavailable.
If the previous interval is available, calculate the difference between the data points and the traffic rate.

These guidelines are summarized in the following influxDB query. This query assumes that data is stored in the measurement ingress_packets. The query uses the same tags as the gauge metric example as well as the tag for counter initialization time, init_time. The query uses average values over a 30-second time interval. It calculates the rate for the metrics that have the same counter initialization.

Use the following query to calculate the number of packets received over an interval of time, without deriving the rate.

In some cases, more than one aggregated data point is returned by the query for a particular time interval. For example, four data points are available for a time interval. Two data points have init_time t0, and the other two have init_time t1. You can run a query that uses the last change timestamp tag, last_change, instead of init_time, to calculate the difference and to derive the rate between the two data points with the same last change timestamp.

Tip:

These queries can all be run as continuous queries and can periodically populate new time-series measurements.

Aggregating Data From Multiple Sources

Certain metrics are reported from multiple line cards or packet forwarding engines. It is useful to aggregate data derived from different sources in the following scenarios:

Packet and byte counts for label-switched paths (LSPs) are reported separately by each line card. However, a view of LSP paths for the entire device is required for path computation element controllers.
For Juniper Networks devices that support virtual output queues, the tail drop or random early detection drop statistics for each queue are reported separately by each line card for every physical interface. It is useful to be able to aggregate the statistics for all the line cards for an interface.
Filter counters for a firewall filter attached to a forwarding table or to an aggregated Ethernet interface are reported separately by each line card. It is useful to aggregate the statistics for all the line cards.

To aggregate data from multiple sources, perform the following:

Aggregate data for a specific period of time for each source, for example, each line card.
Aggregate the data you derive for each source in step 1.

For data stored in an InfluxDB database, you can complete step 1 in the procedure by running a continuous query and populating a new measurement. We strongly recommend that you group the data points according to each source. For example, for LSP statistics, the component_id in the the gpb message identifies the line card sending the data. Group the data points based on each unique component_id.

Example: Aggregating Data from Multiple Sources

In this example, you run two queries to derive the LSP packet rate for data from all line cards.

First, you run the following continuous query on the measurement named lsp_packet_count for each component_id tag and the counter_name tag. Each unique component_id tag corresponds to a different line card. This query populates a new measurement, lsp_packet_rate.

Note:

The LSP statistics sensor does not report counter initialization time.

Use the new measurement derived from this continuous query—lsp_packet_count—to run the following query, which aggregates data from all line cards for packet rates for an LSP named lsp-sjc-den-1.

Note:

Because this query does not group data according to the component_id tag, or line card, the LSP packet rates from all components, or line cards, are returned.

Aggregating Data for Multiple Metrics

It can be useful to aggregate metrics for multiple values. For example, for aggregated Ethernet interfaces, you would typically want to track packet and byte rates for each interface member as well as interface utilization for the aggregated link.

Example: Aggregating Multiple Metric Values

In this example, you run the following two queries:

Continuous query to derive ingress packet counts for each member link in an aggregated Ethernet interface
Query to aggregate packet count data for all the member links that belong to the same aggregated Ethernet interface

The following continuous query derives a measurement, ingress_packets, for each member link in an aggregated Ethernet interface. The interface_name tag identifies each member interface. You also use the parent-ae-name tag to identify membership in a specific aggregated Ethernet interface. Grouping each member link with the parent-ae-name tag ensures that data is collected only for current member links. For example, an interface might change its membership during the reporting interval. Grouping member interfaces with the specific aggregated Ethernet interface means that data for the member link will not be transferred to the new aggregated Ethernet interface of which it is now a member.

The following query aggregates data for the ingress packets for the aggregated Ethernet interface, that is all member links.

Note:

This query aggregates data for aggregated Ethernet interface ae0. The parent-ae-name tag does not verify the actual member links.

ON THIS PAGE

Guidelines for Aggregating Junos Telemetry Data

Aggregating Data Over Fixed Time Span

Example: Aggregating Data for Gauge Metrics

Example: Aggregating Data for Cumulative Statistics

Aggregating Data From Multiple Sources

Example: Aggregating Data from Multiple Sources

Aggregating Data for Multiple Metrics

Example: Aggregating Multiple Metric Values

Related Documentation