Guidelines for Aggregating Junos Telemetry Interface Data
One important feature of the Junos Telemetry Interface is that data processing occurs at the collector that streams data, rather than the device. Data is not automatically aggregated, but it can be aggregated for analysis.
Data aggregation is useful in the following scenarios:
Data for the same metric over fixed spans of time, such as, the average number physical interface ingress errors over a 30-second interval.
Data from different sources (such as multiple line cards) for the same metric, such as label-switched path (LSP) statistics or filter counter statistics.
Data from multiple sources, such as input and output statistics for aggregated Ethernet interfaces.
The follow sections describe how to perform data aggregation for various scenarios. The examples in these sections use the InfluxDB time-series database to accept queries on telemetry data. InfluxDB is an open source database written in Go specifically to handle time-series data.
Aggregating Data Over Fixed Time Spans
Aggregating data for the same metric over fixed spans of time is a common and useful way to detect trends. Metrics can include gauges, that is, single values, or cumulative counters. You might also want to aggregate data continuously.
Example: Aggregating Data for Gauge Metrics
In this example, data for
port.proto is written to the InfluxDB
database with tags that identify the host name, an interface name
and corresponding queue number and measurement called
current_buffer_occupancy. See Table 1 for the specific values
used in this example.
Table 1: Telemetry Data Values
Time Stamp (seconds)
Each measurement data point has a timestamp and recorded value. In this example, the tag queue_number is the numerical identifier of the interface queue.
To aggregate this data over 30-second intervals, use the following influxDB query:
select mean(value) from current_buffer_occupancy where time >= $time_start and time <= $time_end and queue_number=’0’ and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s)
For $time_start and
$time_end, specify the actual range of time.
Example: Aggregating Data for Cumulative Statistics
Some Junos Telemetry Interface sensors report cumulative counter values, such as the number of ingress packets, defined as JuniperNetworksSensors.jnpr_interface_ext.interface_stats.ingress_stats.packets.
It is common to derive traffic rates from packet or byte counters. Unlike with gauge metrics, the initial data point in the series for cumulative counters is used only to set the baseline.
Use the following guidelines to create a database query for cumulative statistics:
Calculate the cumulative value for a specific time interval. You can calculate either an average among several data points recorded during the time interval, or you can interpolate a value. All data points should belong to the same series. If a counter reset has occurred between the two data points reported at different times, do not use both data points.
Determine the appropriate value for the previous time interval. If a counter has been reset since the last update, declare that value as unavailable.
If the previous interval is available, calculate the difference between the data points and the traffic rate.
These guidelines are summarized in the following influxDB query.
This query assumes that data is stored in the measurement
ingress_packets. The query uses the same tags as the
gauge metric example as well as the tag for counter initialization
init_time. The query uses average
values over a 30-second time interval. It calculates the rate for
the metrics that have the same counter initialization.
select non_negative_derivative(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), init_time
Use the following query to calculate the number of packets received over an interval of time, without deriving the rate.
select difference(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), init_time
In some cases, more than one aggregated data point is returned
by the query for a particular time interval. For example, four data
points are available for a time interval. Two data points have
init_time t0, and the other two have init_time
t1. You can run a query that uses the last change timestamp
last_change, instead of
init_time, to calculate the difference and to derive
the rate between the two data points with the same last change timestamp.
select difference(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), last_change
These queries can all be run as continuous queries and can periodically populate new time-series measurements.
Aggregating Data From Multiple Sources
Certain metrics are reported from multiple line cards or packet forwarding engines. It is useful to aggregate data derived from different sources in the following scenarios:
Packet and byte counts for label-switched paths (LSPs) are reported separately by each line card. However, a view of LSP paths for the entire device is required for path computation element controllers.
For Juniper Networks devices that support virtual output queues, the tail drop or random early detection drop statistics for each queue are reported separately by each line card for every physical interface. It is useful to be able to aggregate the statistics for all the line cards for an interface.
Filter counters for a firewall filter attached to a forwarding table or to an aggregated Ethernet interface are reported separately by each line card. It is useful to aggregate the statistics for all the line cards.
To aggregate data from multiple sources, perform the following:
- Aggregate data for a specific period of time for each source, for example, each line card.
- Aggregate the data you derive for each source in step 1.
For data stored in an InfluxDB database, you can complete step 1 in the procedure by running a continuous query
and populating a new measurement. We strongly recommend that you group
the data points according to each source. For example, for LSP statistics,
component_id in the the gpb message
identifies the line card sending the data. Group the data points based
on each unique
Example: Aggregating Data from Multiple Sources
In this example, you run two queries to derive the LSP packet rate for data from all line cards.
First, you run the following continuous query on the measurement
lsp_packet_count for each
component_id tag and the
counter_name tag. Each unique
component_id tag corresponds
to a different line card. This query populates a new measurement,
select non_negative_derivative(mean(value)) as value from lsp_packet_count into lsp_packet_rate group by time(30s), component_id, counter_name, host
The LSP statistics sensor does not report counter initialization time.
Use the new measurement derived from this continuous query—
lsp_packet_count—to run the following query,
which aggregates data from all line cards for packet rates for an
LSP named lsp-sjc-den-1.
select sum(value) from lsp_packet_rate where counter_name=’lsp-sjc-den-1’, host=’sjc-a’
Because this query does not group data according to the
component_id tag, or line card, the LSP packet rates
from all components, or line cards, are returned.
Aggregating Data for Multiple Metrics
It can be useful to aggregate metrics for multiple values. For example, for aggregated Ethernet interfaces, you would typically want to track packet and byte rates for each interface member as well as interface utilization for the aggregated link.
Example: Aggregating Multiple Metric Values
In this example, you run the following two queries:
Continuous query to derive ingress packet counts for each member link in an aggregated Ethernet interface
Query to aggregate packet count data for all the member links that belong to the same aggregated Ethernet interface
The following continuous query derives a measurement,
ingress_packets, for each member link in an aggregated
Ethernet interface. The
identifies each member interface. You also use the
parent_ae_name tag to identify membership in a specific aggregated Ethernet interface.
Grouping each member link with the parent_ae_name tag ensures
that data is collected only for current member links. For example,
an interface might change its membership during the reporting interval.
Grouping member interfaces with the specific aggregated Ethernet interface
means that data for the member link will not be transferred to the
new aggregated Ethernet interface of which it is now a member.
select difference(mean(value)) as value from ingress_packets into ingress_packets_difference group by time(30s), component_id, interface_name, host, parent_ae_name
The following query aggregates data for the ingress packets for the aggregated Ethernet interface, that is all member links.
select sum(value) from ingress_packets_difference where parent_ae_name=’ae0’ and host=’sjc-a’
This query aggregates data for aggregated Ethernet interface
does not verify the actual member links.