Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

View the System Faults

Use this page to know about the severities and faults in the system, time of occurrence and the status. You can segregate the faults by their severity or status using the filter option.

To access the page, select Administration > System Management > System Faults.

Table 1 describes the fields on the System Faults page.

Table 1: System Faults Page Fields

Field

Description

Severity

Specifies if the severity in the system is info, minor, major or critical.

Description

Describes the reason behind the severity.

First Occurred

Specifies the date and time when the issue occurred for the first time. The details of the fault get saved in the database.

Last Occurred

Specifies the date and time when a recurring fault is resolved.

Status

Displays if the status of a fault is active or clear.

Table 2 provides detail about the different faults on the System Faults page.

Table 2: Details of the Faults on the System Faults Page
Fault Severity Description Threshold Fault Recovery
cpu/usage Minor High CPU usage on the node CPU usage is more than 80% over a period of 15 minutes Monitor the CPU resources. If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.
memory/usage Minor High memory usage on the node Memory usage is more than 80% Monitor the memory resources. If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/application Major Low disk space for application storage Disk space usage is more than 80%
  • Use list commands to verify usage under various subdirectories.

  • Clean up reports to free up space.

disk/usage/configuration-bus Major Low disk space for configuration messagebus Disk space usage is more than 80%
  • Use diagnostic commands to assess the usage.

  • Increase the VM resources if required. Consider deploying another Security Director instance when the VM reaches maximum capacity.

disk/usage/configuration-db Major Low disk space for configuration database Disk space usage is more than 80%
  • List the backups and use diagnostic commands to assess the usage.

  • Increase the VM resources if required. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/logs-+-system Major Low disk space for Junos and security logs Disk space usage is more than 80%
  • Use the dashboard to verify log space usage. Delete unused logs to free space.

  • Use diagnostic commands for further analysis.

  • If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.

disk/usage/root Major Low disk space OS disk Disk space usage is more than 80%
  • Use list commands to verify the space usage in /var/log, home directory, or any other directory.

  • Clear space using RM commands.

bootservice/incomplete Critical Bootservice is running and hasn’t completed systemd reports service as active
  • Verify the system resources and the system disk configuration.

  • Reboot the VM.

bootservice/failed Critical Bootservice exited with a non zero exit code systemd reported exit code as non zero
  • Verify the system resources and the system disk configuration.

  • Reboot the VM.

health-monitor/inactive Major healthmonitor service is not running systemd reports service as inactive
  • Inspect the service logs.

  • Verify the OS disk usage and restart the service.

repository-service/inactive Major registry service is not running systemd reports service as inactive
  • Inspect the service logs.

  • Verify the OS disk usage and restart services.

clusterdb/inactive Major clusterdb service is not running systemd reports service as inactive Inspect the service logs and run clusterdb repair.
clusterdb/connectivity Minor Could not connect to clusterdb Could not create a new connect to clusterdb service or have received an error Inspect the service logs and run clusterdb repair.
cluster-manager/inactive Major cluster-manager service is not running systemd reports service as inactive Inspect the service logs and restart services.
kubernetes/inactive Critical kubernetes service is not running systemd reports service as inactive Run kubernetes repair and inspect the service logs.
kubernetes/connectivity Major Could not connect to kubernetes cluster Could not connect to kubernetes API server or have received an error Run kubernetes repair and inspect the service logs.
kubernetes/degraded Critical Components in the kube-system namespace are not running Pods in the kube-system namespace are reporting a non Running state
  • Run kubernetes repair.

  • Check the cluster-manager logs for orchestration issues.

  • Inspect the kube-system logs.

kubernetes/connectivity/infra Major Could not connect to kubernetes cluster to verify infrastructure services Could not connect to kubernetes api server or received an error
  • Run kubernetes repair.

  • Check the cluster-manager logs for orchestration issues.

  • Inspect the kube-system logs.

kubernetes/connectivity/apps Major Could not connect to kubernetes cluster to verify application services Could not connect to kubernetes api server or received an error
  • Run kubernetes repair.

  • Check the cluster-manager logs for orchestration issues.

  • Inspect the kube-system logs.

cluster-manager/install/failure/{application name} Critical Application failed to install/upgrade on kubernetes cluster Application installation reports failed
  • Repair application if repair exists and restart the application.

  • Restart the cluster manager to complete the failed installation.

  • Inspect the cluster manager logs.

.
application/degraded/{application name} Major Application is down on kubernetes cluster Application failed to complete its setup or application is failing to start
  • Check if the application restarts continuously.

  • Inspect the service log to find the reason behind failure.

  • Use repair tools to fix the underlying issues and restart the services if needed.