Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Guidelines for Aggregating Junos Telemetry Interface Data

    One important feature of the Junos Telemetry Interface is that data processing occurs at the collector that streams data, rather than the device. Data is not automatically aggregated, but it can be aggregated for analysis.

    Data aggregation is useful in the following scenarios:

    • Data for the same metric over fixed spans of time, such as, the average number physical interface ingress errors over a 30-second interval.
    • Data from different sources (such as multiple line cards) for the same metric, such as label-switched path (LSP) statistics or filter counter statistics.
    • Data from multiple sources, such as input and output statistics for aggregated Ethernet interfaces.

    The follow sections describe how to perform data aggregation for various scenarios. The examples in these sections use the InfluxDB time-series database to accept queries on telemetry data. InfluxDB is an open source database written in Go specifically to handle time-series data.

    Aggregating Data Over Fixed Time Spans

    Aggregating data for the same metric over fixed spans of time is a common and useful way to detect trends. Metrics can include gauges, that is, single values, or cumulative counters. You might also want to aggregate data continuously.

    Example: Aggregating Data for Gauge Metrics

    In this example, data for JuniperNetworksSensors.jnpr_interface_ext.interface_stats.egress_queue_info.current_buffer_occupancy from port.proto is written to the InfluxDB database with tags that identify the host name, an interface name and corresponding queue number and measurement called current_buffer_occupancy. See Table 1 for the specific values used in this example.

    Table 1: Telemetry Data Values

    Time Stamp (seconds)















    Each measurement data point has a timestamp and recorded value. In this example, the tag queue_number is the numerical identifier of the interface queue.

    To aggregate this data over 30-second intervals, use the following influxDB query:

    select mean(value) from current_buffer_occupancy 
    	      where  time >= $time_start and time <= $time_end and   
    					         queue_number=’0’ and interface_name=’xe-1/0/0’ and host=’sjc-a’ 
    	 			  group by time(30s) 

    For $time_start and $time_end, specify the actual range of time.

    Example: Aggregating Data for Cumulative Statistics

    Some Junos Telemetry Interface sensors report cumulative counter values, such as the number of ingress packets, defined as JuniperNetworksSensors.jnpr_interface_ext.interface_stats.ingress_stats.packets.

    It is common to derive traffic rates from packet or byte counters. Unlike with gauge metrics, the initial data point in the series for cumulative counters is used only to set the baseline.

    Use the following guidelines to create a database query for cumulative statistics:

    • Calculate the cumulative value for a specific time interval. You can calculate either an average among several data points recorded during the time interval, or you can interpolate a value. All data points should belong to the same series. If a counter reset has occurred between the two data points reported at different times, do not use both data points.
    • Determine the appropriate value for the previous time interval. If a counter has been reset since the last update, declare that value as unavailable.
    • If the previous interval is available, calculate the difference between the data points and the traffic rate.

    These guidelines are summarized in the following influxDB query. This query assumes that data is stored in the measurement ingress_packets. The query uses the same tags as the gauge metric example as well as the tag for counter initialization time, init_time. The query uses average values over a 30-second time interval. It calculates the rate for the metrics that have the same counter initialization.

    select non_negative_derivative(mean(value)) from ingress_packets  
            where  time >= $time_start and time <= $time_end and   
                   interface_name=’xe-1/0/0’ and host=’sjc-a’  
            group by time(30s), init_time 

    Use the following query to calculate the number of packets received over an interval of time, without deriving the rate.

    select difference(mean(value)) from ingress_packets  
      	     where  time >= $time_start and time <= $time_end and   
    	              interface_name=’xe-1/0/0’ and host=’sjc-a’  
    	       group by time(30s), init_time

    In some cases, more than one aggregated data point is returned by the query for a particular time interval. For example, four data points are available for a time interval. Two data points have init_time t0, and the other two have init_time t1. You can run a query that uses the last change timestamp tag, last_change, instead of init_time, to calculate the difference and to derive the rate between the two data points with the same last change timestamp.

    select difference(mean(value)) from ingress_packets
    				   where  time >= $time_start and time <= $time_end and
    					         interface_name=’xe-1/0/0’ and host=’sjc-a’
           group by time(30s), last_change

    Tip: These queries can all be run as continuous queries and can periodically populate new time-series measurements.

    Aggregating Data From Multiple Sources

    Certain metrics are reported from multiple line cards or packet forwarding engines. It is useful to aggregate data derived from different sources in the following scenarios:

    • Packet and byte counts for label-switched paths (LSPs) are reported separately by each line card. However, a view of LSP paths for the entire device is required for path computation element controllers.
    • For Juniper Networks devices that support virtual output queues, the tail drop or random early detection drop statistics for each queue are reported separately by each line card for every physical interface. It is useful to be able to aggregate the statistics for all the line cards for an interface.
    • Filter counters for a firewall filter attached to a forwarding table or to an aggregated Ethernet interface are reported separately by each line card. It is useful to aggregate the statistics for all the line cards.

    To aggregate data from multiple sources, perform the following:

    1. Aggregate data for a specific period of time for each source, for example, each line card.
    2. Aggregate the data you derive for each source in step 1.

    For data stored in an InfluxDB database, you can complete step 1 in the procedure by running a continuous query and populating a new measurement. We strongly recommend that you group the data points according to each source. For example, for LSP statistics, the component_id in the the gpb message identifies the line card sending the data. Group the data points based on each unique component_id.

    Example: Aggregating Data from Multiple Sources

    In this example, you run two queries to derive the LSP packet rate for data from all line cards.

    First, you run the following continuous query on the measurement named lsp_packet_count for each component_id tag and the counter_name tag. Each unique component_id tag corresponds to a different line card. This query populates a new measurement, lsp_packet_rate.

    select non_negative_derivative(mean(value)) as value from lsp_packet_count
    	      into lsp_packet_rate
    	      group by time(30s), component_id, counter_name, host

    Note: The LSP statistics sensor does not report counter initialization time.

    Use the new measurement derived from this continuous query—lsp_packet_count—to run the following query, which aggregates data from all line cards for packet rates for an LSP named lsp-sjc-den-1.

    select sum(value) from lsp_packet_rate
    	      where counter_name=’lsp-sjc-den-1’, host=’sjc-a’

    Note: Because this query does not group data according to the component_id tag, or line card, the LSP packet rates from all components, or line cards, are returned.

    Aggregating Data for Multiple Metrics

    It can be useful to aggregate metrics for multiple values. For example, for aggregated Ethernet interfaces, you would typically want to track packet and byte rates for each interface member as well as interface utilization for the aggregated link.

    Example: Aggregating Multiple Metric Values

    In this example, you run the following two queries:

    • Continuous query to derive ingress packet counts for each member link in an aggregated Ethernet interface
    • Query to aggregate packet count data for all the member links that belong to the same aggregated Ethernet interface

    The following continuous query derives a measurement, ingress_packets, for each member link in an aggregated Ethernet interface. The interface_name tag identifies each member interface. You also use the parent_ae_name tag to identify membership in a specific aggregated Ethernet interface. Grouping each member link with the parent_ae_name tag ensures that data is collected only for current member links. For example, an interface might change its membership during the reporting interval. Grouping member interfaces with the specific aggregated Ethernet interface means that data for the member link will not be transferred to the new aggregated Ethernet interface of which it is now a member.

    select difference(mean(value)) as value from ingress_packets
           into ingress_packets_difference
           group by time(30s), component_id, interface_name, host, parent_ae_name

    The following query aggregates data for the ingress packets for the aggregated Ethernet interface, that is all member links.

    select sum(value) from ingress_packets_difference 
           where parent_ae_name=’ae0’ and host=’sjc-a’

    Note: This query aggregates data for aggregated Ethernet interface ae0. The parent_ae_name tag does not verify the actual member links.

    Modified: 2017-08-15