Monitoring the Health of the Telemetry Service
An important factor to consider when creating your custom telemetry collection is to ensure that the service does not cause excessive load on your devices. Some telemetry services can cause a higher load on your devices depending on the CLI show command and the data you are collecting. When you configure a collector to execute at short intervals you can possibly overload your devices, potentially impacting traffic forwarding.
By default, Apstra provides an IBA telemetry health probe that enables you to monitor the health of telemetry services, including any custom services and collectors you configured.
From the blueprint, navigate to Analytics > Probes.
Select the Device Telemetry Health probe from the table.
Click Query: All to filter the data in the table.
For example, to display data for your new custom telemetry service, select a service name from the Service name drop-down filter. In our example, the service name is BFD.
Click Apply. The table now shows the health metric for your custom telemetry service.
Check the following:
-
Ensure that the Success Count value has increased. If this value has not increased, this could mean that your service is failing or that your custom collector is misconfigured.
-
Check the Execution Time. Although the execution time can vary, if the time is close to or higher than the service interval, this might indicate a problem. If this is the case, tune your probe settings and set a higher service interval. For instructions, see Customize a Probe.
Similarly, a sustained nonzero Waiting Time can indicate that the device is taking too long to complete your service request.
To see how your metrics are trending, switch to Time Series view under the Data Source drop-down.
For more information about each of these columns and their definitions, see Telemetry Collection Statistics in the Juniper Apstra User Guide.