ON THIS PAGE
Monitor Device and Network Health
Paragon Insights (formerly HealthBot) offers several ways to detect and troubleshoot device-level and network-level health problems. Use the information provided by the following Paragon Insights GUI pages to investigate and discover the root cause of issues detected by Paragon Insights:
Dashboard
- Device Dashlets
- Device Group List Dashlet
- Network Group List Dashlet
- Health Alert Dashlets
- TSDB Dashlets
Paragon Insights Release 4.0.0 introduces GUI enhancements brought about by the migration to a new framework. This change does not have an impact on existing functionality or resources in the GUI but introduces the following changes in the interface:
Favorites option
Launchpad icon
TSDB (time series database) dashlets
The Favorites option, denoted by a star button at top right corner of all pages, allows users to mark pages under the Favorites section for easier access.
In the top right corner of the UI, if you click the Launchpad button (rocket icon), you get a drop down menu that takes you to the Sizing Tool and the Github repository for Paragon Insights rules called Playbooks (github). Sizing Tool allows you to estimate the compute (vCPU), memory (RAM), and storage requirements to deploy or scale Paragon Insights in your network. Visit the sizing tool app to estimate your requirements.
Starting with Paragon Insights Release 4.0.0, the Alarms option is renamed as Alerts. To access the Alerts page, go to Monitor > Alerts.
Use the Dashboard to create a custom view of what you’re most interested in. Paragon Insights pre-populates the dashboard with the Device List, Device Group List, and Netwok Group List dashlets and calls this view My Dashboard. You can create your own dashboard view by clicking the + to the right of My Dashboard. Custom views can be added, renamed, and deleted as you see fit.
The Dashboard also has a graphical list of pre-defined dashlets across the top that is initially hidden from view. Click the cluster of 9 blue dots on the upper right part of the page to display or hide the available dashlets. Each dashlet provides graphical information from a specific point of view. Many of the dashlets can be clicked on to drill down deeper into the information presented.
Devices Consists of devices dashlet, device vendor dashlet, and device status dashlet (see Device Dashlets).
Device Groups Consists of device group dashlets (see Device Group List Dashlet).
Network Groups Consists of network group dashlets (see Network Group List Dashlet).
Health Alert Consists of health alert dashlets (see Health Alert Dashlets).
TSDB (Time Series Database) Consists of TSDB dashlets that have line charts for Buffer Bytes, Buffer Length, and bar charts for Read Error for Last 5 Minutes, Write Error for Last 5 Minutes, and Buffer Length (see TSDB Dashlets).
The Dashboard uses two types of colored objects to provide health status: halos and bars. The following table describes the meaning of the severity level colors displayed by the status halos and bars on the Dashboard:
Color |
Definition |
---|---|
Green |
The overall health of the device, device group, or network group is normal. No problems have been detected. |
Yellow |
There might be a problem with the health of the device, device group, or network group. A minor problem has been detected. Further investigation is required. |
Red |
The health of the device, device group, or network group is severe. A major problem has been detected. |
Gray |
No data is available. |
- Device Dashlets
- Device Group List Dashlet
- Network Group List Dashlet
- Health Alert Dashlets
- TSDB Dashlets
Device Dashlets
The following table describes the main features of the Devices dashlet, Device Vendor dashlet, and Device Status dashlet on the dashboard.
Dashlets |
Description |
---|---|
Devices |
Devices dashlet lists the status and hostname of all devices configured in Paragon Insights. Click on the number under the Device Groups column in the dashlet to trace the device group to which the device is added. It goes to the Device Group Configuration page. Click the circular arrow at the top of the dashlet to refresh the dashlet. Click the X at the top of the dashlet to remove the dashlet from the dashboard. |
Device Vendor |
The half pie chart in this dashlet shows the total number of devices classified by vendor name. Each vendor is distinguished by a unique color. If you hover over the chart, the status halo displays the number of devices from a vendor and the percentage share of the devices in the total. The legend by the chart displays the names of the vendors. If you click on the name of a particular vendor, the devices from that vendor is filtered out from the data shown in the pie chart. |
Device Status |
The pie chart in this dashlet shows the total number of devices in the platform classified by health status. Each health status is distinguished by a unique color. If you hover over the chart, the status halo of each segment displays the number of devices with the health status denoted by the segment color and the percentage share of the devices in the total. The legend by the chart displays the health statuses of devices. If you click on the name of a particular status, the devices with that health status is filtered out from the pie chart. |
Device Group List Dashlet
The following table describes the main features of the Device Groups dashlet and Device Group Health dashlet on the dashboard.
Dashlets |
Description |
---|---|
Device Groups |
To edit device group properties, click the device group name. For information on device group properties, see Manage Devices, Device Groups, and Network Groups. To display the list of devices that belong to a device group, click the integer number on this dashlet that represents the number of devices included in the device group. To display the list of playbooks applied on a device group, click the integer number that represents the number of playbooks applied on the device group. To remove this dashlet from the dashboard, click the X button at the top corner of the dashlet. The color of each segment in the pie chart represents the health status of the devices in the device group. For example, if a chart has one half segment as green and the other half segment as yellow, then no problems are detected in the number of devices displayed in the green segment and minor problems are detected in the number of devices displayed in the yellow segment. Clicking on a segment takes you to the Monitor > Device Configuration page. The coloring of the status halo in the pie chart segments represents the percentage of devices in the device group that have the health status defined by the color. For example, if the circle halo is all green, then the health of 100% of the devices in the device group is normal. The legend by the pie chart displays the different health statuses of device groups. If you click on the name of a particular status such as healthy, the device groups with that status are filtered out from the pie chart. |
Device Group Health |
The pie chart in this dashlet classifies all device groups in the platform by their health status. Each health status is distinguished by a unique color. If you hover over the chart, the status halo of each segment displays the number of devices in the device group with the health status denoted by the segment color and the percentage share of the devices in the total. The legend by the chart displays the device groups. If you click on the name of a particular device group, the devices in that device group are filtered out from the pie chart. |
Network Group List Dashlet
The following table describes the main features of the Network Groups dashlet and Network Health dashlet.
Dashlets |
Description |
---|---|
Network Groups |
To edit network group properties, click the network group name. For information on network group properties, see Manage Devices, Device Groups, and Network Groups. Click the X at the top of the dashlet to remove the dashlet from the dashboard. Click the name of a network group in the dashlet to open the Network Health page of a particular network group. The status icons displayed against a network group name represents the overall health status of the network group. It can read no data or display an icon to indicate warning, alert, and healthy status. |
Network Health |
This dashlet shows a pie chart with several segments to classify the network groups based on their network health. The segment size denotes the number of network groups with the status of that segment. The segment color represents the overall health status of the network group(s) in that segment. For example, if you hover over a red segment in the chart, it displays the name(s) of the network group(s) with major alerts and the total count of network groups with major alerts. The legend by the pie chart displays the names of all network groups. If you click on the name of a particular network group, that group is filtered out from the pie chart. |
Health Alert Dashlets
The following table describes the properties of the Health Alert dashlets.
Dashlets |
Description |
---|---|
Health Alert Severity |
The pie chart in this dashlet shows the total number of health alerts generated in Paragon Insights classified by the alert severity level. Each segment of the pie chart represents a severity level distinguished by a unique color. If you hover your cursor over the chart, the status halo of each segment displays the number of alerts with the severity level denoted by the segment color and the percentage share of the alerts in the total. The legend by the chart displays the severity levels of the alerts. If you click on the name of a severity level, the alerts marked with that severity level are filtered out from the pie chart. |
Health Alert Status |
The pie chart in this dashlet shows alerts generated in Paragon Insights classified by their status in the platform: open, closed, and expired. Each segment of the pie chart represents a status distinguished by a unique color. If you hover your cursor over the chart, the status halo of each segment displays the number of alerts with the status denoted by the segment color and the percentage share of the alerts in the total. The legend by the chart displays the three main statuses of alerts. If you click on the name of an alert status, the alerts marked with that status are filtered out from the pie chart. |
Health Alert Summary |
The chart in this dashlet shows a time line view of alerts generated in Paragon Insights classified by their severity levels. Each severity level in the chart is distinguished by a unique color. If you hover your cursor over any time point in the chart, the number of alerts per severity level is displayed for that time. The legend by the chart displays the severity levels of alerts. If you click on the name of an alert severity level, alerts marked with that severity are filtered out from the chart. |
TSDB Dashlets
Each TSDB table row, that is also known as a point, contains data for a particular key at a given time. In TSDB, write or read requests are executed by grouping multiple points into a batch point.
Buffer Length denotes the number of batch points that are buffered per node at a given time. Buffered Bytes denotes the total size of the batch points buffered per TSDB node. In TSDB, a maximum of 1GB of batch point can be buffered per node.
Read or write errors occur when TSDB does not accept more requests as the buffer is full or because of issues in Kubernetes clusters. Errors can be minimized through database sharding. For more information, see Paragon Insights Time Series Database (TSDB).
Figure 5 shows a sample TSDB dashboard.
The following table describes the main features of the time series database dashlet on the Dashboard. For information on how TSDB works, see Paragon Insights Time Series Database (TSDB).
Dashlets |
Description |
---|---|
TSDB Buffer Bytes |
The chart in this dashlet shows a time line view of buffered bytes classified by the number of nodes. Each TSDB node in the chart is distinguished by a unique color. If you hover your cursor over any time point in the chart, the total buffered bytes per node for that time is displayed. The legend by the chart displays the nodes. If you click on the name of a node, buffered bytes data of that node is filtered out from the chart. When the chart refreshes, the vertical axis of buffered bytes is auto adjusted based on the data. The data in vertical axis is in bytes. To delete the dashlet, click the X at the top of the dashlet. |
TSDB Buffer Length |
The chart in this dashlet shows a time line view of buffer length classified by the number of nodes. Each TSDB node in the chart is distinguished by a unique color. If you hover your cursor over any time point in the chart, the buffer length per node for that time is displayed. The legend by the chart displays the nodes. If you click on the name of a node, buffer length data of that node is filtered out from the chart. The vertical axis shows data of buffer length in terms of absolute number of batch points. To delete the dashlet, click the X at the top of the dashlet. |
TSDB Read Errors Last 5 Minutes |
The bar chart in this dashlet shows the number of TSDB read errors collected every 5 minutes classified by the number of nodes. The legend displays the nodes. If you click on the name of a node, the read error data of that node is filtered out from the chart. The horizontal axis shows read errors in absolute number. To delete the dashlet, click the X at the top of the dashlet. |
TSDB Write Errors Last 5 Minutes |
The bar chart in this dashlet shows the number of TSDB write errors collected every 5 minutes classified by the number of nodes. The legend displays the nodes. If you click on the name of a node, the write error data of that node is filtered out from the chart. The horizontal axis shows write errors in absolute number. To delete the dashlet, click the X at the top of the dashlet. |
Latest TSDB Buffer Length |
The bar chart in this dashlet shows the number of TSDB buffer length classified by the number of nodes. The legend displays the nodes. If you click on the name of a node, the buffer length data of that node is filtered out from the chart. The horizontal axis shows buffer length in absolute number. To delete the dashlet, click the X at the top of the dashlet. |
Health
Use the Health page (Monitor > Health) to monitor and track the health of a single device, a device group, or a network. You can also troubleshoot problems. Select a device group using the entity type selectors (DEVICE, DEVICE GROUP, or NETWORK) located in the top left corner of the page. Once selected, you can then select individual devices or all of the devices from the group by clicking the Select devices pull-down menu. The page is divided into the following three main views that, when used together, can help you investigate the root cause of problems detected on your devices:
Timeline View
In timeline view, you can monitor real-time and past occurrences of KPI events flagged with a minor or major severity level health status. The general characteristics and behaviors of the timeline include (see Figure 6):
Clicking on the right caret next to the Timeline View heading expands or collapses the timeline.
Each dot or line in the timeline represents the health status of a unique KPI event (also known as a Paragon Insights rule trigger) for a pre-defined KPI key with which Paragon Insights has detected a minor or major severity level issue. The name of each event is displayed (per device) directly to the left of its associated health status dot or line.
The health status dot or line for each unique KPI event in the timeline can consist of several different KPI keys. Use tile view and table view to see the health status information for the individual KPI keys.
Only minor or major severity level KPI events are displayed in the timeline. Yellow represents a minor event, and red represents a major one.
A KPI event that occurs once (at only one point in time) and does not recur continuously over time is represented as a dot.
A KPI event that occurs continuously over time is represented as a horizontal line.
Timeline data is displayed for a 2-hour customizable time range.
The red vertical line on the timeline represents the current time.
The blue vertical line on the timeline represents the user-defined point of time for which to display data.
The following table describes the main features of the timeline:
Feature |
Description |
---|---|
Display information about a dot or horizontal line in the timeline. |
Hover over the dot or horizontal line to display the associated KPI event name, device name, health status severity level, and event start and end times. Additional health status information about the KPI event can be found in tile view. For information about tile view, see the Tile View section. |
For the displayed data, change the range of time (x-axis) that is visible on the page. |
Options:
|
Choose a different 2-hour time range of data to display. |
Use the blue vertical line to customize the time range of data to display. Options for enabling the blue vertical line:
Data is generally displayed for 1 hour before and 1 hour after the blue line. Hover over the blue line to display the exact point in time that it represents. Drag the blue line left or right to adjust the time. Note:
Auto-refresh is disabled whenever you enable the blue line. Re-enabling auto-refresh disables the blue line and resets the timeline to display the most recent 2-hour time range of data. |
Freeze the timeline (disable auto-refresh). |
Toggle the auto-refresh switch to the left. |
Unfreeze the timeline (enable auto-refresh). |
Toggle the auto-refresh switch to the right. |
Tile View
The tile view uses colored tiles to allow you to monitor and troubleshoot the health of a device. The tiles are organized first by device group, then by device component topic, and lastly by unique KPI key (see Figure 7). By default, the tile view data corresponds to the most recent data collected. To customize the point in time for which data is displayed in tile view, select a particular point in time from the date/time drop-down menu (located above the timeline) or enable the blue vertical line in timeline view. For information about how to enable the blue vertical line, see the Timeline View section. The Composite toggle switch (not shown) at the upper right of the TILE VIEW, allows you to select data from more than one device component topic to be shown in the Table View and, thus, the Time Inspector View. This can be useful when topics must be combined to find root cause for an issue. For example, system memory usage could combine with output queue usage to create a performance issue in an overloaded system.
The following table describes the meaning of the severity level colors displayed by the status tiles:
Color |
Definition |
---|---|
Green |
The overall health of the KPI key is normal. No problems have been detected. |
Yellow |
There might be a problem with the health of a KPI key. A minor problem has been detected. Further investigation is required. |
Red |
The health of a KPI key is severe. A major problem has been detected. |
Gray |
No data is available. |
The following table describes the main features of the tile view:
Feature |
Description |
---|---|
Display information about a status tile. |
Options:
Note: If the number of KPI keys exceeds 220, the keys are automatically aggregated and grouped. |
Display information in table view about the status tiles associated with a single device component topic. |
Click on a device component topic name in tile view. For information about table view, see the Table View section. |
Composite Toggle |
When active, users can click on specific keys within the tile groups. This allows you to pass multiple KPIs to the Time Inspector View. |
Table View
The table view allows you to monitor and troubleshoot the health of a single device based on Paragon Insights data provided in a customizable table. You can search, sort, and filter the table data to find specific KPI information, which can be especially useful for large network deployments. To select which attributes are displayed in the table, check the appropriate check box in the field selection bar above the table (see Figure 8). The checkbox on the left side of each row is used to help activate the Time Inspector view. Multiple rows can be selected at one time.
The following table describes the Paragon Insights attributes supported in table view:
Attributes |
Description |
---|---|
Time |
Time and date the event occurred. |
Device |
Device name. |
Group |
Device group name. |
Topic |
Rule topic name. |
Keys |
Unique KPI key name. |
KPI |
Key Performance Indicator (KPI) name associated with an event. |
Status |
Health status color. Each color represents a different severity level. |
Message |
Health status message. |
The following table describes the meaning of the severity level colors displayed by the Status column:
Color |
Definition |
---|---|
Green |
The overall health of the KPI key is normal. No problems have been detected. |
Yellow |
There might be a problem with the health of a KPI key. A minor problem has been detected. Further investigation is required. |
Red |
The health of a KPI key is severe. A major problem has been detected. |
Gray |
No data is available. |
The following table describes the main features of the table view:
Feature |
Description |
---|---|
Sort the data by ascending or descending order based on a specific data type. |
Click on the name of the data type at the top of the column by which you want to sort. |
Filter the data in the table based on a keyword. |
Enter the keyword in the text box under the name of a data type at the top of the table (see Figure 8). |
Navigate to a different page of the table. |
Options:
|
If the data in a cell is truncated, view all of the data in a cell. |
Options:
|
Row selection checkbox |
Make this row’s data available for Time Inspector view. |
Time Inspector View
Time Inspector is a composite view that provides a timeline view of trigger conditions on KPI data that you selected in Table View. You can also drag and drop trigger conditions to view the conditions in one graph or as separate graphs. Time Inspector was initially available only when the entity type DEVICE GROUP is selected. However, starting with Paragon Insights Release 4.3.0, Time Inspector View is also available when you select Device or Network entity type. After you select an entity type, you can access time inspector view by clicking the TIME INSPECTOR button.
This view allows you to drill down into field-level data for specific triggers over a time line.
When the Health page is first accessed, the TIME INSPECTOR button is disabled. To activate the TIME INSPECTOR button, you must:
Select the Entity Type from the Health page.
In releases earlier than Paragon Insights Release 4.3.0, Time Inspector View was available only when the entity type DEVICE GROUP is selected. However, starting with Paragon Insights Release 4.3.0, Time Inspector View is also available when you select Device or Network entity type.
Select one or more devices, device groups, or networks from the drop-down list next to the entity type that you have selected.
Have valid data in at least one device, device group, or network component topic in TILE VIEW.
Note:Topics showing “no data” will not work for enabling the Time Inspector view.
Have data appearing in the TABLE VIEW section. You can achive this by clicking the device, device group, or network component topic header in TILE VIEW.
Select the checkbox to the left of at least one of the rows in TABLE VIEW.
When clicked, the TIME INSPECTOR button opens a pop-up window above the Health page. Figure 9 below shows a time inspector window created from the system.storage usage topic for a specific device.
As you can see, the Time Inspector window has a mini timeline at the top, an incremented line chart below, and a chart selector section at the bottom. This particular chart was created as a composite (indicated by the merging blue arrow) of a file-system-utilization in the check-storage rule of the system.storage topic.
Note that there are three fields in the check-storage rule: used-percentage, low-threshold, and high-threshold. Since the chart was created as a composite (fields charted together) there are three lines on the displayed chart. If the “chart fields separately” button (diverging arrows) were clicked instead, you would see 3 single-line charts showing the same data.
The more rules you select with the TABLE VIEW checkboxes, the more charts you can create in the Time Inspector view.
Network Health
Use the Network Health page (Monitor > Network Health) to monitor and track the health of a Network Group and troubleshoot problems. Select a Network Group using the drop-down list located in the top left corner of the page. Comparable to the Device Group Health page (see the Health section), the Network Health page is divided into three main views: timeline, tile, and table. The Network Health page provides similar features and functionality for a network group as the Device Group Health page provides for a single device.
Graphs Page
You can use graphs to monitor the status and health of your network devices. Graphs allow you to visualize data collected by Paragon Insights from a device, showing the results of rule processing. Access the page from the left-nav panel Monitor>Graphs>Charts
Graphs are refreshed every 60 seconds.
Graph types include time series graphs, histograms, and heatmaps.
Time series graphs are the kind you are used to, showing the data in a ’2D’ format where the x-axis indicates time while the y-axis indicates the value. Time series graphs are useful for real-time monitoring, and also to show historical patterns or trends. This graph type does not provide insight into whether a given value is ’good’ or ’bad’, it simply reports ’the latest value’.
Histograms work quite differently. Rather than show a continuous stream of data based on when each value occurred, histograms aggregate the data to show the distribution of the values over time. This results in a graph that shows ’how many instances of each value’. Histograms also show data in a ’2D’ format, however in this case the x-axis indicates the value while the y-axis indicates the number of instances of the given value.
Heatmaps bring together the elements above and provide a ’3D’ view to help determine the deviations in the data. Like a time series graph, the x-axis indicates time, while the y-axis indicates the value. Then the ’how many’ aspect of a histogram is added in. Finally, the third dimension—color—is added. It is common to think of the colors as showing heath, i.e., red means ’bad, yellow means ’OK’, and green means ’good’. However, this is not correct; the color adds context. For each column, the bars indicate the various values that occurred. The color then indicates how often the values occurred relative to the neighboring values. Within each vertical set of bars, the values that occurred more frequently show as ’hotter’ with orange and red, while those values that occurred less frequently show as shades of green.
To help illustrate these graph types, consider the graphs shown below.
All three graphs are showing the same data—the running 1-minute average of CPU utilization on a device over the last 24 hours. However, the way they visualize the data varies:
The time series graph provides the typical view; each minute it adds the latest data point to the end of the line graph. Time moves forward along the x-axis from left to right, and the data values are indicated on the y-axis. What this graph doesn’t show is how often each data point has occurred.
The histogram groups together the values to show how many of each data point there are. Notice the tallest bar is the one between 30 and 40, which means the most common 1-minute CPU average value is in the 30-40% range. And how many times did this range of values occur? Based on the y-axis, there have been over 350 instances of values in this range. The next most frequently occurring values are in the 40-50% range (almost 300 occurrences), while the 0-10% range has almost no occurrences, suggesting this CPU is rarely idle. What this graph doesn’t show is how many of each data point occurred within a given time range.
The heatmap makes use of elements from the other two graph types. Each small bar indicates that some number of instances occurred within the value range shown in the y-axis, at the given time show in the x-axis. The color indicates which value ranges, for each given time, occurred more than others. To illustrate this, notice the vertical set of bars towards the right of the graph, at 18:00. In this example (at this zoom level), each column of vertical bars represents 12 minutes, and each small bar represents a bucket of 15 values. So the first (lowest) bar indicates that within this time range there were some values in the 0-14 range. The bar above indicates that within this time range there were some values in the 15-29 range, and so on. The color then indicates which bars have more values than others. In this example, the third bar is red indicating that for those 12 minutes most of the values fell into the 30-44 range (in this example the count is 21). By contrast, the first bar is the most green indicating that for those 12 minutes the least number of values fell into the 0-14 range (in this example the count is 1). This ’heat’ information is also supported by the histogram; the most frequently occurring values were those in the 30-40 range, which indeed is the ’hotter’ range in the heatmap.
The configuration model for graphs is to create graph panels and group them into one or more canvases.
To create a new graph panel on a canvas:
Graph Types
How to Create Graphs
Managing Graphs
Graph Tips and Tricks
Use Cases
-
To edit a graph, click the pencil icon located in the top right corner of the graph itself.
-
To delete a graph, click the trash can icon located in the top right corner of the graph itself.
-
To delete a canvas, click the trash can icon located in the top right corner of the canvas.
-
To sort canvases on the Saved Canvas page, click on the column headings.
-
To reorganize graphs on the screen, hover your mouse near the upper-left corner of a graph panel and click-and-drag it to the desired position.
-
To resize a graph, hover your mouse over the lower-right corner of the graph panel and click-and-drag it to the desired size.
-
To change the color of graph elements, click the color bar for the desired line item under the graph.
-
To zoom in on a graph, click and drag across the desired section of the graph; to zoom out, double-click on the graph.
-
To isolate an element on the graph, click its related line item under the graph; to view all elements again, click the same line item.
How do I monitor interface flaps for a single interface?
-
Playbook used: interface-kpis-playbook
-
Graph configuration
-
Graph panel
How do I monitor interface flaps for all ’ge’ interfaces on a device in a single graph?
-
Playbook used: interface-kpis-playbook
-
Graph configuration
-
Graph panel
How do I monitor system memory usage for all devices in a device group in a single graph?
-
Playbook used: system-kpis-playbook
-
Graph configuration
-
Graph panel
How do I monitor RE CPU usage for multiple devices in a single graph?
-
Playbook used: system-kpis-playbook
-
Graph configuration
-
Graph panel
How do I monitor RE CPU usage for multiple devices side by side?
-
Playbook used: system-kpis-playbook
-
Graph configuration
-
Graph panel
Change History Table
Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.