View the System Faults

Use this page to know about the severities and faults in the system, time of occurrence and the status. You can segregate the faults by their severity or status using the filter option.

To access the page, select Administration > System Management > System Faults.

Table 1 describes the fields on the System Faults page.

Table 1: System Faults Page Fields
Field	Description
Severity	Specifies if the severity in the system is info, minor, major or critical.
Description	Describes the reason behind the severity.
First Occurred	Specifies the date and time when the issue occurred for the first time. The details of the fault get saved in the database.
Last Occurred	Specifies the date and time when a recurring fault is resolved.
Status	Displays if the status of a fault is active or clear.

Table 2 provides detail about the different faults on the System Faults page.

Table 2: Details of the Faults on the System Faults Page
Fault	Severity	Description	Threshold	Fault Recovery
cpu/usage	Minor	High CPU usage on the node	CPU usage is more than 80% over a period of 15 minutes	Monitor the CPU resources. If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.
memory/usage	Minor	High memory usage on the node	Memory usage is more than 80%	Monitor the memory resources. If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/application	Major	Low disk space for application storage	Disk space usage is more than 80%	Use list commands to verify usage under various subdirectories. Clean up reports to free up space.
disk/usage/configuration-bus	Major	Low disk space for configuration messagebus	Disk space usage is more than 80%	Use diagnostic commands to assess the usage. Increase the VM resources if required. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/configuration-db	Major	Low disk space for configuration database	Disk space usage is more than 80%	List the backups and use diagnostic commands to assess the usage. Increase the VM resources if required. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/logs-+-system	Major	Low disk space for Junos and security logs	Disk space usage is more than 80%	Use the dashboard to verify log space usage. Delete unused logs to free space. Use diagnostic commands for further analysis. If the usage is consistently high, increase the VM resources. Consider deploying another Security Director instance when the VM reaches maximum capacity.
disk/usage/root	Major	Low disk space OS disk	Disk space usage is more than 80%	Use list commands to verify the space usage in /var/log, home directory, or any other directory. Clear space using RM commands.
bootservice/incomplete	Critical	Bootservice is running and hasn’t completed	systemd reports service as active	Verify the system resources and the system disk configuration. Reboot the VM.
bootservice/failed	Critical	Bootservice exited with a non zero exit code	systemd reported exit code as non zero	Verify the system resources and the system disk configuration. Reboot the VM.
health-monitor/inactive	Major	healthmonitor service is not running	systemd reports service as inactive	Inspect the service logs. Verify the OS disk usage and restart the service.
repository-service/inactive	Major	registry service is not running	systemd reports service as inactive	Inspect the service logs. Verify the OS disk usage and restart services.
clusterdb/inactive	Major	clusterdb service is not running	systemd reports service as inactive	Inspect the service logs and run clusterdb repair.
clusterdb/connectivity	Minor	Could not connect to clusterdb	Could not create a new connect to clusterdb service or have received an error	Inspect the service logs and run clusterdb repair.
cluster-manager/inactive	Major	cluster-manager service is not running	systemd reports service as inactive	Inspect the service logs and restart services.
kubernetes/inactive	Critical	kubernetes service is not running	systemd reports service as inactive	Run kubernetes repair and inspect the service logs.
kubernetes/connectivity	Major	Could not connect to kubernetes cluster	Could not connect to kubernetes API server or have received an error	Run kubernetes repair and inspect the service logs.
kubernetes/degraded	Critical	Components in the kube-system namespace are not running	Pods in the kube-system namespace are reporting a non Running state	Run kubernetes repair. Check the cluster-manager logs for orchestration issues. Inspect the kube-system logs.
kubernetes/connectivity/infra	Major	Could not connect to kubernetes cluster to verify infrastructure services	Could not connect to kubernetes api server or received an error	Run kubernetes repair. Check the cluster-manager logs for orchestration issues. Inspect the kube-system logs.
kubernetes/connectivity/apps	Major	Could not connect to kubernetes cluster to verify application services	Could not connect to kubernetes api server or received an error	Run kubernetes repair. Check the cluster-manager logs for orchestration issues. Inspect the kube-system logs.
cluster-manager/install/failure/{application name}	Critical	Application failed to install/upgrade on kubernetes cluster	Application installation reports failed	Repair application if repair exists and restart the application. Restart the cluster manager to complete the failed installation. Inspect the cluster manager logs. .
application/degraded/{application name}	Major	Application is down on kubernetes cluster	Application failed to complete its setup or application is failing to start	Check if the application restarts continuously. Inspect the service log to find the reason behind failure. Use repair tools to fix the underlying issues and restart the services if needed.