Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Monitor the Health of the Telemetry Service

It is important to consider the load on your devices when creating a custom telemetry collection. Telemetry services could overload your devices based on the CLI show commands and collected data. Short intervals for collector execution can also impact impacting traffic forwarding. By default, Apstra provides an IBA telemetry health probe to monitor service health, including customer services and collectors.

To monitor the health of your telemetry service:

  1. From your blueprint, navigate to Analytics > Probes.
  2. Select the Device Telemetry Health probe from the table.
    Monitoring dashboard showing tabs for Dashboard, Analytics, Staged, Uncommitted, Active, and Time Voyager. List of probes includes Device System Health, Device Telemetry Health, and Device Traffic, all marked Operational with a green checkmark and no anomalies. Highlighted row is Device Telemetry Health with a selected checkbox. Create Probe button on the right.
  3. To filter the telemetry health, click the magnifying glass icon.
    To display data for your new custom telemetry service, select a service name from the Service name drop-down filter (in this example, Power).
    Telemetry analytics dashboard with telemetry stats panel showing categories like Degraded Wait Time and service filters. Service Name Power is highlighted.
  4. Click Apply. The table now shows the health metric for your custom telemetry service.
    Table displaying service data for Power. Systems DA719, DB757, DT505 show service not started with zero metrics. Systems GM215 and GM228 have high run and success counts 11256 and 11260 with no failures.

    Check the following:

    • Ensure that the Success Count value has increased. If the value remains the same, your service might be failing. Alternatively, your custom collector could be misconfigured.

    • Check the Execution Time.

      If the execution time resembles or exceeds the service interval, there might be an issue. If so, adjust your probe settings and increase the service interval. For instructions on setting the service interval, see Create a Probe.

      Similarly, a sustained nonzero Waiting Time can indicate that the device is taking too long to complete your service request.

  5. To see how your metrics are trending, switch to Time Series view under the Data Source drop-down. The following graph shows the metrics for Power service.
    Monitoring dashboard for network systems showing execution time metrics by System ID and Service name. Features line graphs for Spine and Superspine categories.

    For more information about each of these columns and their definitions, see Telemetry Collection Statistics in the Juniper Apstra User Guide.