Supported KPIs in Observability
Key Performance Indicators (KPIs) are metrics used to monitor and evaluate the health, performance, and quality of your network.
Starting Release 2.8.0, Routing Director uses dial-in gNMI connection in the Sample subscription mode to collect KPIs like interface status and usage, routing protocols states, route counts, system CPU, memory usage, and so on for analysis and visualization. In earlier releases, the dial-out gNMI connection was used to collect KPIs. KPIs help you to proactively identify issues, ensure service quality, and maintain optimal network operations. Following are the types of KPIs that are collected by Routing Director:
-
Default KPIs—Default KPIs are system-generated KPIs. These KPIs are automatically monitored and analyzed by Routing Director to assess device and network health. Routing Director collects these KPIs based on predefined rules.
-
Custom KPIs—Custom KPIs are user-defined KPIs. You can use Routing Director to define and monitor KPIs tailored to your specific network needs. Routing Director collects these KPIs based on custom rules.
For Routing Director to establish a dial‑in connection for collecting device telemetry, the device must meet at least one of the following conditions:
-
The device must be managed by a network implementation plan with the Observability use case selected.
-
An OpenConfig Custom KPI rule must be instantiated on the device.
Routing Director opens the gNMI dial-in connection to the device on port 32767 for collecting device telemetry. When you upgrade from an earlier release to Routing Director Release 2.8.0, all the dial-out connections change to dial-in connections without any data loss automatically.
-
Mutual TLS (M-TLS) for dial-in gNMI connections are not supported on devices running Junos OS and Junos OS Evolved versions from 24.2R1 to 24.4R1. This limitation occurs because gNMI connections fail when certificate verification is enabled on devices running these versions.
As a result, starting with Routing Director Release 2.8.0, m-TLS certificate verification is disabled by default on Juniper devices.
For all other Junos OS and Junos OS Evolved versions, to validate the controller node identity, use the following command to enable m-TLS certificate validation:
curl -X PUT -H "Content-Type: application/json" \ -u test@test.com:Test-Password \ "https://VIP-1/api/v1/orgs/{org-id}/gnmi/options" \ --data '{"device": {"client-certificate-request": "require-certificate-and-verify"}}' --insecure -
You must ensure that the firewall rules on your network permits connections from all Routing Director controller node IP addresses to the devices over port 32767.
Routing Director generates alerts based on the anomalies in the KPIs (both system-generated and custom KPIs) in your network. You can view a graphical representation of the performance and alerts generated for all KPIs associated with a device.
Routing Director uses Junos Telemetry to collect KPIs for Junos Network device. Here are the details on the frequency at which the data is collected for the following data models (sensor models):
- OpenConfig—60 seconds
- NETCONF—180 seconds
Table 1 describes the list of KPIs that are supported in the Observability use case.
| Domain | Rule Name | Sensor Name | Sensor Type | Field Name | Field Count (Total number of fields evaluated per rule) | Number of metrics stored in TSDB per rule |
|---|---|---|---|---|---|---|
| bfd | check-bfd-session-state | bfd-sensor | iAgent | remote-state, session-neighbor, session-state | 3 | 4 |
| vpn | check-evpn-instance-state | evpn | iAgent | evpn-instance-name, evpn-interface-mode, evpn-interface-name, evpn-interface-status | 4 | 4 |
| vpn | check-evpn-neighbor | evpnNeighbors | iAgent | evpn-instance-name, evpn-num-neighbors | 2 | 3 |
| vpn | evpn-check-mac-count-netconf | evpn-mac-count | iAgent | instance-name, learn-vlan, mac-count, threshold | 4 | 5 |
| chassis | check-chassis-power-fan-temperature | chassis-netconf | iAgent | class, comment, name, status, temperature | 5 | 3 |
| chassis | check-psm-temperature | psm-temperature-netconf | iAgent | psm, psm-temperature-high-threshold, psm-temperature-low-threshold, temperature | 4 | 5 |
| chassis | check-fan-state-rpm | fan-netconf | iAgent | critical-rpm-threshold, fan-name, high-rpm-threshold, high-threshold, low-threshold, measurement, measurements, rpm-high-threshold, rpm-low-threshold, rpm-percent, rpm-percent-anomaly, rpm-percents, status | 13 | 12 |
| chassis | check-psm-power-usage-state | components-oc | open-config | power-dc-output, psm, psm-power-capacity-maximum, psm-power-usage, psm-power-usage-anomaly, psm-power-usage-high-threshold, psm-power-usage-low-threshold, psm-state, psm-temperature, psm-temperature-degrees, psm-temperature-high-threshold, psm-temperature-low-threshold | 12 | 12 |
| chassis | check-routing-engine-temperature | components-oc | open-config | high-threshold, low-threshold, routing-engine, routing-engine-cpu-temperature, routing-engine-cpu-temperature-anomaly, routing-engine-temperature, routing-engine-temperature-anomaly | 7 | 12 |
| chassis | check-system-power-usage-temp-state | components-oc | open-config | chassis, chassis-temperature, high-threshold, low-threshold, power-system-maximum, power-system-remaining, system-power-remaining-in-percentage, system-power-usage-high-threshold, system-power-usage-low-threshold | 9 | 8 |
| fpc | check-fpc-cpu-memory-state-temp | components-oc | open-config | cpu-high-threshold, cpu-low-threshold, fpc, fpc-cpu-utilization, fpc-cpu-utilization-anomaly, fpc-memory-buffer, fpc-memory-buffer-anomaly, fpc-memory-heap, fpc-utilization-idle, memory-high-threshold, memory-low-threshold, state, temp-high-threshold, temp-low-threshold, temperature, temperature-anomaly | 16 | 22 |
| fpc | check-pfe-discards | pfe-sensor-netconf | iAgent | bad-route-discard, bad-route-discard-rate, bits-to-test-discard, bits-to-test-discard-rate, data-error-discard, data-error-discard-rate, drop-threshold, fabric-discard, fabric-discard-rate, info-cell-discard, info-cell-discard-rate, invalid-iif-discard, invalid-iif-discard-rate, nexthop-discard, nexthop-discard-rate, stack-overflow-discard, stack-overflow-discard-rate, stack-underflow-discard, stack-underflow-discard-rate, tcp-header-error-discard, tcp-header-error-discard-rate | 21 | 23 |
| system | check-ntp-synchronization-status | ntp-status | iAgent | clock-jitter, offset, peer, precision, reference-id, reference-time, root-delay, root-dispersion, status-info, stratum | 10 | 5 |
| system | check-system-cpu-memory | components-oc | open-config | re-cpu-utilization, re-cpu-utilization-anomaly, re-cpu-utilization-high-threshold, re-cpu-utilization-low-threshold, re-memory-buffer, re-memory-buffer-anomaly, re-memory-buffer-high-threshold, re-memory-buffer-low-threshold, routing-engine | 9 | 14 |
| interface | check-physical-interface-traffic | ifd | egress-stats-if-bps, egress-stats-if-octets, egress-stats-if-pkts, egress-stats-if-pps, elapsed-time, if-name, ingress-stats-if-bps, ingress-stats-if-octets, ingress-stats-if-pkts, ingress-stats-if-pps, stats_received_count | 11 | 3 | |
| interface | check-ifl-state | interfaces-oc | open-config | high-threshold, ifl-oper-status, in-bandwidth, in-mbps, in-octets, in-util, interface-name, low-threshold, out-bandwidth, out-mbps, out-octets, out-util, sub-interface-index | 13 | 9 |
| interface | check-interface-fec-crc-framing-errors | errorinfo-netconf | iAgent | drop-threshold, fec-uncorrected, framing-errors, input-crc-errors, interface-name, optical-fec-corrected, output-crc-errors | 7 | 7 |
| interface | check-interface-in-out-errors-traffic-state-flaps | interfaces-oc | open-config | admin-state, flaps, flaps-threshold, high-threshold, in-errors-count, in-errors-threshold, in-mbps, in-mbps-anomaly, in-octets, in-util, interface-name, link-state, low-threshold, out-errors-count, out-errors-threshold, out-mbps, out-mbps-anomaly, out-octets, out-util, speed | 20 | 23 |
| interface | check-optical-signal-loss-fec-tx-rx-power | optical-sensor-oc | open-config | fec-uncorrected, interface-name, lane-index, optics-current, optics-rx-power, optics-rx-power-anomaly, optics-tx-power, optics-tx-power-anomaly, rx-high-alarm-threshold, rx-high-warning-threshold, rx-loss-of-signal-alarm, rx-low-alarm-threshold, rx-low-warning-threshold, tx-high-alarm-threshold, tx-high-warning-threshold, tx-laser-disabled-alarm, tx-loss-of-signal-functionality-alarm, tx-low-alarm-threshold, tx-low-warning-threshold | 19 | 22 |
| interface | check-optical-temp-thresholds | temperature-thresholds-oc | open-config | high-alarm-threshold, high-warning-threshold, interface-name, optical-temp, optical-temp-anomaly | 5 | 8 |
| lldp | check-lldp-session | lldp-sensor | iAgent | interface-name, lldp-neighbor-count | 2 | 3 |
| oam | get-lfm-information | link-fault-management-information | iAgent | lfm-discovery-state, lfm-interface-name, lfm-status | 3 | 4 |
| bgp | check-bgp-neighbor-prefixes | bgp-netconf | iAgent | address-family, advertised-route-count-threshold, advertised-routes, instance-name, peer-address, received-route-count-threshold, received-routes | 7 | 6 |
| bgp | check-bgp-neighbor-stats | bgp-netconf | iAgent | flap-count, flap-count-threshold, instance-name, peer-address, peer-state | 5 | 5 |
| routes | collect-fib-stats | fib-sensor | iAgent | address-family, fib-route-count, route-table-type, table-name, threshold | 5 | 4 |
| isis | check-isis-adjacency-status | isis-netconf | iAgent | adjacency-state, interface-name, level, system-name | 4 | 3 |
| isis | check-isis-flap-detection | isis-netconf | iAgent | flap-threshold, interface-name, transition-count | 3 | 4 |
| isis | check-isis-statistics | isis-sensor | open-config | csnp-drops, esh-drops, iih-drops, interface-name, ish-drops, lsp-drops, psnp-drops, threshold, unknown-drops | 9 | 10 |
| mpls | check-te-rsvp-interface-errors | lsp | open-config | authentication-fail, bad-checksum, bad-packet-format, bad-packet-length, bad-packet-version, in-path-error-messages, in-reservation-error-messages, message-out-of-order, out-path-error-messages, out-reservation-error-messages, received-nack, recv-pkt-disabled-intf, send-failure, state-timeout, te-interface, unknown-ack, unknown-nack | 17 | 16 |
| mpls | check-te-rsvp-global-errors | lsp | open-config | authentication-fail, bad-checksum, bad-packet-format, bad-packet-length, error-threshold, in-path-error-messages, in-reservation-error-messages, instance-name, out-path-error-messages, out-reservation-error-messages, received-nack, unknown-ack, unknown-nack | 13 | 14 |
| mpls | check-ldp-session | ldp-oc | open-config | lsr-id, session-state | 2 | 3 |
| mpls | check-lsp-state | lsp | open-config | lsp-name, lsp-state-change-count, oper-status | 3 | 4 |
| mpls | check-rsvp-neighbor-state | rsvp | open-config | neighbor-address, neighbor-interface, neighbor-state | 3 | 4 |
| ospf | check-ospf-io-statistics | ospf-io-statistics-netconf | iAgent | error-threshold, ospf-error, packets-read | 3 | 4 |
| ospf | check-ospf-neighbor-state | ospf-neighbor-netconf | iAgent | dr-address, instance-name, interface-name, neighbor-address, neighbor-id, ospf-neighbor-state | 6 | 3 |
| ospf | check-ospf-statistics | ospf-statistics-netconf | iAgent | hello-count-threshold, hello-received, hello-sent, ospf-packet-type | 4 | 5 |
| ospf | check-ospf3-io-statistics | ospf-io-statistics-netconf | iAgent | error-threshold, ospf-error | 2 | 4 |
| ospf | check-ospf3-neighbor-state | ospf-neighbor-netconf | iAgent | instance-name, interface-name, neighbor-address, neighbor-id, ospf-neighbor-state | 5 | 3 |
| ospf | check-ospf3-statistics | ospf-statistics-netconf | iAgent | hello-count-threshold, hello-received, hello-sent, ospf-packet-type | 4 | 5 |
| routes | collect-rib-table-protocol-routes | route-protocol-summary | iAgent | active-route-count, protocol-name, protocol-total-route-count, table-name, threshold | 5 | 5 |
| routes | collect-rib-table-routes | route-summary | iAgent | active-route-count, destination-count, hidden-route-count, holddown-route-count, table-name, table-total-route-count, threshold | 7 | 5 |
| vpn | check-evpn-view | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, pe-router-name, vpn-name, vpn-state | 6 | 8 | |
| vpn | check-l2circuit-pw-state | l2ckt | iAgent | connection-id, connection-status, neighbor | 3 | 3 |
| vpn | check-l3vpn-bgp-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-ospf-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-ospf3-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-session, pe-router-name, vpn-name, vpn-state | 7 | 9 | |
| vpn | check-l3vpn-static-state | network-rule | instance-ifl-no, instance-interface-name, instance-interface-status, neighbor-address, pe-router-name, vpn-name | 6 | 7 |
Retrieve List of Sensors Streaming Telemetry Data
Use the command, show network-agent statistics gnmi, to get a list of
sensors subscribed on the device and that are streaming data to Routing Director. The
following is a sample output of the command.
user@router> show network-agent statistics gnmi Subscription ID: 101 Sensor Path: /interfaces/interface/state/counters Reporting Interval: 10 seconds Components: fpc0 Average Latency: 4 ms Circular Buffer Used: 12% Subscription ID: 102 Sensor Path: /components/component/state/temperature Reporting Interval: on_change Components: fpc0 Average Latency: 6 ms Circular Buffer Used: 7% Subscription ID: 110 Sensor Path: /interfaces/interface/subinterfaces/subinterface/state/oper-status Reporting Interval: 30 seconds Components: fpc1 Average Latency: 3 ms Circular Buffer Used: 2%