Supported KPIs in Observability
Key Performance Indicators (KPIs) are metrics used to monitor and evaluate the health, performance, and quality of your network.
Starting Release 2.8.0, Routing Director uses dial-in gNMI connection in the Sample subscription mode to collect KPIs like interface status and usage, routing protocols states, route counts, system CPU, memory usage, and so on for analysis and visualization. In earlier releases, the dial-out gNMI connection was used to collect KPIs. KPIs help you to proactively identify issues, ensure service quality, and maintain optimal network operations. Following are the types of KPIs that are collected by Routing Director:
-
Default KPIs—These are system-generated KPIs. These KPIs are automatically monitored and analyzed by Routing Director to assess device and network health. Routing Director collect these KPIs based on predefined rules.
-
Custom KPIs—These are user-defined KPIs. You can use Routing Director to define and monitor KPIs tailored to your specific network needs. Routing Director collect these KPIs based on custom rules.
For Routing Director to establish a dial‑in connection for collecting device telemetry, the device must meet at least one of the following conditions:
-
The device must be managed by a network implementation plan with the Observability use case selected.
-
An OpenConfig Custom KPI rule must be instantiated on the device.
Routing Director opens the gNMI dial-in connection to the device on port 32767 for collecting device telemetry. When you upgrade from an earlier release to Routing Director Release 2.8.0, all the dial-out connections change to dial-in connections without any data loss automatically.
-
Dial-in gNMI connections are not supported on devices running Junos OS and Junos OS Evolved versions 24.2R1, 24.2R2, and 24.4R1. This limitation occurs because gNMI connections fail when certificate verification is enabled on devices running these versions.
As a result, starting with Juniper Routing Director Release 2.8.0, certificate verification is disabled by default on Juniper devices.
For all other Junos OS and Junos OS Evolved versions, enable client certificate validation for the gNMI connection by executing the following
curlcommand:curl -X PUT -H "Content-Type: application/json" \ -u test@test.com:Test-Password \ "https://VIP-1/api/v1/orgs/{org-id}/gnmi/options" \ --data '{"device": {"client-certificate-request": "require-certificate-and-verify"}}' --insecure -
You must ensure that the firewall rules of your network allow collections from all JRD Controller node IP addresses to devices over port 32767.
Routing Director generates alerts based on the anomalies in the KPIs (both system-generated and custom KPIs) in your network. You can view a graphical representation of the performance and alerts generated for all KPIs associated with a device.
Routing Director uses Junos Telemetry to collect KPIs for Junos Network device. Here are the details on the frequency at which the data is collected for the following data models (sensor models):
- OpenConfig—60 seconds
- NETCONF—180 seconds
Table 1 describes the list of KPIs that are supported in the Observability use case.
| Domain | Rule Name | Sensor Name | Sensor Type | Field Name | Field Count (Total number of fields evaluated per rule) | Number of metrics stored in TSDB per rule |
|---|---|---|---|---|---|---|
| bfd | check-bfd-session-state | bfd-sensor | iAgent | remote-state, session-neighbor, session-state | 3 | 4 |
| vpn | check-evpn-instance-state | evpn | iAgent | evpn-instance-name, evpn-interface-mode, evpn-interface-name, evpn-interface-status | 4 | 4 |
| vpn | check-evpn-neighbor | evpnNeighbors | iAgent | evpn-instance-name, evpn-num-neighbors | 2 | 3 |
| vpn | evpn-check-mac-count-netconf | evpn-mac-count | iAgent | instance-name, learn-vlan, mac-count, threshold | 4 | 5 |
| chassis | check-chassis-power-fan-temperature | chassis-netconf | iAgent | class, comment, name, status, temperature | 5 | 3 |
| chassis | check-psm-temperature | psm-temperature-netconf | iAgent | psm, psm-temperature-high-threshold, psm-temperature-low-threshold, temperature | 4 | 5 |
| chassis | check-fan-state-rpm | fan-netconf | iAgent | critical-rpm-threshold, fan-name, high-rpm-threshold, high-threshold, low-threshold, measurement, measurements, rpm-high-threshold, rpm-low-threshold, rpm-percent, rpm-percent-anomaly, rpm-percents, status | 13 | 12 |
| chassis | check-psm-power-usage-state | components-oc | open-config | power-dc-output, psm, psm-power-capacity-maximum, psm-power-usage, psm-power-usage-anomaly, psm-power-usage-high-threshold, psm-power-usage-low-threshold, psm-state, psm-temperature, psm-temperature-degrees, psm-temperature-high-threshold, psm-temperature-low-threshold | 12 | 12 |
| chassis | check-routing-engine-temperature | components-oc | open-config | high-threshold, low-threshold, routing-engine, routing-engine-cpu-temperature, routing-engine-cpu-temperature-anomaly, routing-engine-temperature, routing-engine-temperature-anomaly | 7 | 12 |
| chassis | check-system-power-usage-temp-state | components-oc | open-config | chassis, chassis-temperature, high-threshold, low-threshold, power-system-maximum, power-system-remaining, system-power-remaining-in-percentage, system-power-usage-high-threshold, system-power-usage-low-threshold | 9 | 8 |
| fpc | check-fpc-cpu-memory-state-temp | components-oc | open-config | cpu-high-threshold, cpu-low-threshold, fpc, fpc-cpu-utilization, fpc-cpu-utilization-anomaly, fpc-memory-buffer, fpc-memory-buffer-anomaly, fpc-memory-heap, fpc-utilization-idle, memory-high-threshold, memory-low-threshold, state, temp-high-threshold, temp-low-threshold, temperature, temperature-anomaly | 16 | 22 |
| fpc | check-pfe-discards | pfe-sensor-netconf | iAgent | bad-route-discard, bad-route-discard-rate, bits-to-test-discard, bits-to-test-discard-rate, data-error-discard, data-error-discard-rate, drop-threshold, fabric-discard, fabric-discard-rate, info-cell-discard, info-cell-discard-rate, invalid-iif-discard, invalid-iif-discard-rate, nexthop-discard, nexthop-discard-rate, stack-overflow-discard, stack-overflow-discard-rate, stack-underflow-discard, stack-underflow-discard-rate, tcp-header-error-discard, tcp-header-error-discard-rate | 21 | 23 |
| system | check-ntp-synchronization-status | ntp-status | iAgent | clock-jitter, offset, peer, precision, reference-id, reference-time, root-delay, root-dispersion, status-info, stratum | 10 | 5 |
| system | check-system-cpu-memory | components-oc | open-config | re-cpu-utilization, re-cpu-utilization-anomaly, re-cpu-utilization-high-threshold, re-cpu-utilization-low-threshold, re-memory-buffer, re-memory-buffer-anomaly, re-memory-buffer-high-threshold, re-memory-buffer-low-threshold, routing-engine | 9 | 14 |
| interface | check-physical-interface-traffic | ifd | egress-stats-if-bps, egress-stats-if-octets, egress-stats-if-pkts, egress-stats-if-pps, elapsed-time, if-name, ingress-stats-if-bps, ingress-stats-if-octets, ingress-stats-if-pkts, ingress-stats-if-pps, stats_received_count | 11 | 3 | |
| interface | check-ifl-state | interfaces-oc | open-config | high-threshold, ifl-oper-status, in-bandwidth, in-mbps, in-octets, in-util, interface-name, low-threshold, out-bandwidth, out-mbps, out-octets, out-util, sub-interface-index | 13 | 9 |
| interface | check-interface-fec-crc-framing-errors | errorinfo-netconf | iAgent | drop-threshold, fec-uncorrected, framing-errors, input-crc-errors, interface-name, optical-fec-corrected, output-crc-errors | 7 | 7 |
| interface | check-interface-in-out-errors-traffic-state-flaps | interfaces-oc | open-config | admin-state, flaps, flaps-threshold, high-threshold, in-errors-count, in-errors-threshold, in-mbps, in-mbps-anomaly, in-octets, in-util, interface-name, link-state, low-threshold, out-errors-count, out-errors-threshold, out-mbps, out-mbps-anomaly, out-octets, out-util, speed | 20 | 23 |
| interface | check-optical-signal-loss-fec-tx-rx-power | optical-sensor-oc | open-config | fec-uncorrected, interface-name, lane-index, optics-current, optics-rx-power, optics-rx-power-anomaly, optics-tx-power, optics-tx-power-anomaly, rx-high-alarm-threshold, rx-high-warning-threshold, rx-loss-of-signal-alarm, rx-low-alarm-threshold, rx-low-warning-threshold, tx-high-alarm-threshold, tx-high-warning-threshold, tx-laser-disabled-alarm, tx-loss-of-signal-functionality-alarm, tx-low-alarm-threshold, tx-low-warning-threshold | 19 | 22 |
| interface | check-optical-temp-thresholds | temperature-thresholds-oc | open-config | high-alarm-threshold, high-warning-threshold, interface-name, optical-temp, optical-temp-anomaly | 5 | 8 |
| lldp | check-lldp-session | lldp-sensor | iAgent | interface-name, lldp-neighbor-count | 2 | 3 |
| oam | get-lfm-information | link-fault-management-information | iAgent | lfm-discovery-state, lfm-interface-name, lfm-status | 3 | 4 |
| bgp | check-bgp-neighbor-prefixes | bgp-netconf | iAgent | address-family, advertised-route-count-threshold, advertised-routes, instance-name, peer-address, received-route-count-threshold, received-routes | 7 | 6 |
| bgp | check-bgp-neighbor-stats | bgp-netconf | iAgent | flap-count, flap-count-threshold, instance-name, peer-address, peer-state | 5 | 5 |
| routes | collect-fib-stats | fib-sensor | iAgent | address-family, fib-route-count, route-table-type, table-name, threshold | 5 | 4 |
| isis | check-isis-adjacency-status | isis-netconf | iAgent | adjacency-state, interface-name, level, system-name | 4 | 3 |
| isis | check-isis-flap-detection | isis-netconf | iAgent | flap-threshold, interface-name, transition-count | 3 | 4 |
| isis | check-isis-statistics | isis-sensor | open-config | csnp-drops, esh-drops, iih-drops, interface-name, ish-drops, lsp-drops, psnp-drops, threshold, unknown-drops | 9 | 10 |
| mpls | check-te-rsvp-interface-errors | lsp | open-config | authentication-fail, bad-checksum, bad-packet-format, bad-packet-length, bad-packet-version, in-path-error-messages, in-reservation-error-messages, message-out-of-order, out-path-error-messages, out-reservation-error-messages, received-nack, recv-pkt-disabled-intf, send-failure, state-timeout, te-interface, unknown-ack, unknown-nack | 17 | 16 |
| mpls | check-te-rsvp-global-errors | lsp | open-config | authentication-fail, bad-checksum, bad-packet-format, bad-packet-length, error-threshold, in-path-error-messages, in-reservation-error-messages, instance-name, out-path-error-messages, out-reservation-error-messages, received-nack, unknown-ack, unknown-nack | 13 | 14 |
| mpls | check-ldp-session | ldp-oc | open-config | lsr-id, session-state | 2 | 3 |
| mpls | check-lsp-state | lsp | open-config | lsp-name, lsp-state-change-count, oper-status | 3 | 4 |
| mpls | check-rsvp-neighbor-state | rsvp | open-config | neighbor-address, neighbor-interface, neighbor-state | 3 | 4 |
| ospf | check-ospf-io-statistics | ospf-io-statistics-netconf | iAgent | error-threshold, ospf-error, packets-read | 3 | 4 |
| ospf | check-ospf-neighbor-state | ospf-neighbor-netconf | iAgent | dr-address, instance-name, interface-name, neighbor-address, neighbor-id, ospf-neighbor-state | 6 | 3 |
| ospf | check-ospf-statistics | ospf-statistics-netconf | iAgent | hello-count-threshold, hello-received, hello-sent, ospf-packet-type | 4 | 5 |
| ospf | check-ospf3-io-statistics | ospf-io-statistics-netconf | iAgent | error-threshold, ospf-error | 2 | 4 |
| ospf | check-ospf3-neighbor-state | ospf-neighbor-netconf | iAgent | instance-name, interface-name, neighbor-address, neighbor-id, ospf-neighbor-state | 5 | 3 |
| ospf | check-ospf3-statistics | ospf-statistics-netconf | iAgent | hello-count-threshold, hello-received, hello-sent, ospf-packet-type | 4 | 5 |
| routes | collect-rib-table-protocol-routes | route-protocol-summary | iAgent | active-route-count, protocol-name, protocol-total-route-count, table-name, threshold | 5 | 5 |
| routes | collect-rib-table-routes | route-summary | iAgent | active-route-count, destination-count, hidden-route-count, holddown-route-count, table-name, table-total-route-count, threshold | 7 | 5 |
| vpn | check-evpn-view | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, pe-router-name, vpn-name, vpn-state | 6 | 8 | |
| vpn | check-l2circuit-pw-state | l2ckt | iAgent | connection-id, connection-status, neighbor | 3 | 3 |
| vpn | check-l3vpn-bgp-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-ospf-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-ospf3-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-static-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-address, pe-router-name, vpn-name | 6 | 7 |
Retrieve List of Sensors Streaming Telemetry Data
Use the command, show network-agent statistics gnmi, to get a list of
sensors subscribed on the device and that are streaming data to Routing Director. The
following is a sample output of the command.
user@router> show network-agent statistics gnmi Subscription ID: 101 Sensor Path: /interfaces/interface/state/counters Reporting Interval: 10 seconds Components: fpc0 Average Latency: 4 ms Circular Buffer Used: 12% Subscription ID: 102 Sensor Path: /components/component/state/temperature Reporting Interval: on_change Components: fpc0 Average Latency: 6 ms Circular Buffer Used: 7% Subscription ID: 110 Sensor Path: /interfaces/interface/subinterfaces/subinterface/state/oper-status Reporting Interval: 30 seconds Components: fpc1 Average Latency: 3 ms Circular Buffer Used: 2%