NorthStar Analytics Raw and Aggregated Data Retention
Raw data logs are retained in Elasticsearch for a user-configurable number of days. Data is also rolled up (aggregated) every hour and retained for a user-configurable number of days. The purpose of aggregation is to make longer retention of data more feasible given limited disk space. When you modify these retention parameters, keep in mind that there is an impact on your storage resources.
Stored hourly aggregated data filenames use the following format: rollups-northstar-yyyy-mm-dd.
The parameters described in Table 1 work together to control
data retention and aggregation behaviors. The parameters are located
/opt/northstar/data/northstar.cfg, and you can modify their values there.
Table 1: Data Retention and Aggregation Parameters
Controls how often the CollectionCleanup system task is run. This task executes the collector-utils.py script to clean up old logs. The default is one day (1d). The collector-utils.py script runs at approximately 1:00 AM, NorthStar server time.
Units can be hours (h), days (d), or weeks (w).
The collector-utils.py script uses the elaticsearch APIs to clean up “old” data as follows:
The CollectionCleanup task is called from the NorthStar server. You can view (but not modify) the cleanup task by navigating to Administration > Task Scheduler.
Defines what is considered an “old” log of raw data. The default is 90 days, meaning that raw data logs are retained in Elasticsearch for 90 days. This can be expressed only in days, so no unit designation is required. To disable the retention of raw data logs, set the value to 0.
Defines what is considered “old” aggregated data. The default is 1000 days, meaning that hourly aggregated data is retained in Elasticsearch for 1000 days. This can be expressed only in days, so no unit designation is required. To disable retention of aggregated data, set the value to 0.
Controls how often the ESRollup system task is run. This task executes the esrollup.py script to aggregate the previous interval’s data. The default is 1 hour (1h).
Note: We recommend that you do not change this default value except to disable aggregation. If you want to disable data aggregation, set the value to -1.
The esrollup.py script uses the elaticsearch APIs to perform the data aggregation.
The ESRollup task is called from the NorthStar server. You can view (but not modify) the rollup task by navigating to Administration > Task Scheduler.
There is an additional parameter, dbCapacity, that controls how long event data is stored. This parameter is not related to analytics. See Event View for information about changing the value of this parameter from the default of 35 days.
The NorthStar REST API supports telemetry data aggregation with the additional parameters described in Table 2. See the NorthStar REST API documentation for more information.
Table 2: Additional Aggregation Parameters Used for API Queries
A value of 1 indicates that rollup query functionality is enabled. A value of 0 indicates it is disabled.
If rollup_query_enabled is set to 1 (enabled) and the requested time range in stats REST API is greater than es_rollup_cutoff_days from now, the query uses the roll-up index to search data.
To modify retention or aggregation parameters, use a text editing tool such as vi and modify the value of the parameters in the northstar.cfg file. For example:
vi /opt/northstar/data/northstar.cfg . . . collection_cleanup_task_interval=7d es_log_retention_days=30 es_log_rollups_retention_days=800
In this example, raw data logs older than 30 days and hourly aggregated data logs older than 800 days are set to be purged every seven days.
The data included in the rollup tasks (aggregation types, fields,
and counters) is defined in the view-only esrollup_config.json file
located in the
To view the system tasks that launch the esrollup.py and collector-utils.py scripts, navigate to Administration > Task Scheduler in the NorthStar web UI. In the Task list, the Name column indicates CollectionCleanup or ESRollup Task. In the Type column, they are designated as ExecuteScript. An example is shown in Figure 1.
There is an optional column in the task list that indicates whether each task is a system task. Hover over any column heading, click the down arrow that appears, and highlight Columns to display a list of available columns. Click the check box for System Task to select the System Task column (true/false) for inclusion in the display.
When you select a system task, Summary, Status, and History tabs are available at the bottom of the window.