Querying the Metrics Database
Introduction
The metrics database uses the TimescaleDB extension to efficiently manage time series, providing additional functions for data aggregation and rollup.
In this chapter we will introduce some of the TimescaleDB functions commonly used to retrieve time series data, as well as the database objects and fields of the metrics database and their usage within an SQL query.
Basic TimescaleDB Functions
The time_bucket
function in TimescaleDB allows you to customize the
granularity of an underlying dataset. This function accepts arbitrary time intervals as
well as optional offsets and returns the bucket start time.
As an example, suppose we have a table temperature_measurement
:
CREATE TABLE temperature_measurement( id BIGINT PRIMARY KEY NOT NULL, "time" timestamp with time zone, -- time when the temperature has been read device text, -- identifier of the device that read the temperature temperature double precision -- metric value );
Now if measurements are taken every 10 seconds, using time_bucket
we can
run aggregation functions on a coarser granularity. The query below returns the median value
(50th percentile) of the temperature within 30-second buckets:
SELECT time_bucket('30 seconds',"time") as 30s_bucket, percentile_cont(0.5) WITHIN GROUP (ORDER BY temperature) as "Temperature (Celsius)" FROM temperature_measurement GROUP BY 30s_bucket ORDER BY 30s_bucket;
We recommend that you provide an alias for the field generated by
time_bucket
, different from the time column. In the above example, the
alias 30s_bucket
is used.
The same alias must also be used in the GROUP BY and ORDER BY clauses.
Further functions that can be used for advanced analytics queries are found under the heading Advanced analytic-queries in the TimescaleDB documentation.
Database Objects to Query
Each plugin in Paragon Active Assurance has:
- one view for tests
- five views for monitors with metrics aggregated at different granularities: 10 seconds (raw), 1 minute, 5 minutes, 30 minutes, and 60 minutes.
The naming convention for the view objects is the following:
vw_(monitor|test)_metrics_<plugin_name>
for raw datavw_monitor_metrics_minute<minutes>_<plugin_name>
for predefined time buckets on raw data for monitoring.
As an example, the HTTP plugin has the following database objects that can be queried:
vw_test_metrics_http vw_monitor_metrics_http vw_monitor_metrics_minute1_http vw_monitor_metrics_minute5_http vw_monitor_metrics_minute30_http vw_monitor_metrics_minute60_http
Each view contains two categories of information:
- measurement
- metric
The measurement category is needed to retrieve the information related to a task in a test or monitor, and it must be included in the WHERE clause of an SQL statement. Examples of fields in this category are:
monitor_name, task_name, measurement_name, test_name
The metric category consists of values of metrics retrieved at a given point in time.
The time
column is common to all views and must be included in each query.
The columns holding specific measurements differ from one task to another, and they are usually part of the metrics table in the view definition.
For example, the vw_monitor_metrics_http
view contains:
metrics.connect_time_avg, metrics.first_byte_time_avg, metrics.response_time_min, metrics.response_time_avg, metrics.response_time_max, metrics.size_avg, metrics.speed_avg, metrics.es_timeout, metrics.es_response, metrics.es
Example: Here is how to write a query for the median of the response time average for an HTTP task associated with "Measurement 1" (test or monitor name) using a 30-second time bucket between 2021-06-21 09:55:30+00 and 2021-06-21 09:57:00+00.
SELECT -- time field time_bucket('30s',"time") as "time", -- metric percentile_cont(0.5) WITHIN GROUP (ORDER BY response_time_avg) as "avg_response_time" FROM -- data source vw_monitor_metrics_http WHERE -- time interval to analyze "time" between '2021-06-21 09:55:30+00' AND '2021-06-21 09:57:00+00' AND -- measurement identifier measurement_name = 'Measurement 1' AND -- account identifier account_short_name = 'account_1' GROUP BY "time" ORDER BY "time"