This guide is to be used when investigating the root cause for an SSR that reports alarms of CPU spikes.
A CPU alarm is triggered when the average CPU of all of the cores on a system exceeds 85% for thirty seconds. This will not include any CPU cores that are pinned for packet forwarding. The alarm is cleared when the average CPU is below 85% for five (5) seconds.
You can determine whether a system has pinned CPU cores by checking the configuration. The CPU allocation is defined within the
node configuration of a router.
firstname.lastname@example.org# show config running authority router novigrad node gouda
In particular, the
forwarding-core-count indicates how many cores are dedicated for fast packet forwarding.
The default configuration has
forwarding-core-mode set to
automatic with no
forwarding-core-count defined. The SSR platform will attempt to right-size the configuration based on system's available resources. For some deployments, it may be desirable to override the defaults to optimize the platform for your environment.
When the forwarding-count-mode is set to automatic, you can see how the system has allocated resources by issuing the command
show platform cpu.
email@example.com# show platform cpu
Thu 2020-03-19 15:23:24 UTC
Type: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz
Speed: 2.400096893310547 GHz
Forwarding Cores: 1
Isolated Cores: 1
Completed in 2.75 seconds
With an understanding of how the system is configured, we can now get to the process of examining the history of CPU usage over time. The SSR stores time series data for a number of KPIs that are relevant for system and service health and operation. Viewing time series data is best accomplished within the GUI.
Navigate to the Custom Reports located on the dashboard. From there create two reports: one for total utilization per CPU and another for utilization per SSR process, as indicated by the images below.
Alarms are optionally overlaid on top of all charts generated by the SSR. This is very helpful in correlating system events to system behavior. The time window of custom reports can be extended from 5 minutes to 6 months. If you see anomalies in either the CPU and correspondingly a particular application, this may indicate the system is not performing properly.