Perform a Cluster Health Check
This topic describes the health check commands available in Routing Director.
Purpose
Perform a health check on the cluster, view the overall and detailed status of the cluster, and troubleshoot specific issues.
You can use either the Deployment Shell or the Linux root shell to execute a cluster health check.
The command performs multiple health tests on the cluster and returns a detailed list of all the tests conducted and each of their results. The health-check command checks for multiple parameters such as:
-
Kubernetes status
-
Health of each node (CPU, disk space, memory, I/O latency, and so on)
-
Database health (Postgres, ArangoDB, OpenSearch, Kafka, and so on)
-
Ceph storage health
The overall health status is categorized as green, amber, or red. A green status indicates a healthy cluster and that all health checks have passed successfully. A red status indicates critical issues in the cluster. An amber status indicates that there maybe some noncritical issues in the cluster. The status is returned amber in the following instances:
-
Nodes have taints
-
Disk usage or memory usage on any node exceeds 80% of available space
-
Disk I/O latency on any node exceeds 100000 ms
-
Rook Ceph status shows HEALTH_WARN
-
Number of kafka in-sync replicas are not equal to the number of kafka replicas
The results of the health-check command are stored in a Postgres database. The result of the latest health-check command can be retrieved from the database using the Routing Director GUI (Settings > Health Checks).
Additionally, a cron job runs every hour, at the top of the hour, that automatically checks the health of the cluster and stores the resulting output in the Postgres database.
Action
You can use either the Deployment Shell or the Linux root shell to execute a cluster status health check.
Perform a health check using Deployment Shell
Log in to a cluster node and use the request deployment
health-check command in Deployment Shell.
Sample Output
root@primary1> request deployment health-check Health status checking... ======================================================= Get node count of Kubernetes cluster. ======================================================= OK There are 4 nodes in the cluster. ... <output snipped> ... ====================================================== Verifying Elasticsearch ====================================================== OK Opensearch test... Checking health status at opensearch-cluster-master.common:9200... Opensearch is healthy (green). OPENSEARCH VERIFICATION PASS ======================================================= Overall cluster status ======================================================= GREEN
Perform a health check using the Linux root shell
exit to exit to the Linux root
shell. Use the following commands to retrieve the Routing Director cluster
health status.- Default health-check
- Check the health of specific functions
- Enable verbose mode logging
- Check full cluster health
Default health-check
Use the health-check command to check, retrieve, and
display the status of the Routing Director cluster health. The health of
the cluster is checked for the default parameters. The output of this
command is the same as the request deployment
health-check command output in the Deployment Shell.
Sample Output
root@primary1:~# health-check 2025-10-07 19:10:09 Health status checking in manual Mode ====================================================== Get node count of Kubernetes cluster ====================================================== OK ====================================================== Get node status of Kubernetes cluster ====================================================== OK ====================================================== Get node readiness status of Kubernetes cluster ====================================================== OK ... <output snipped> ... ====================================================== Verifying Elasticsearch ====================================================== OK Opensearch test... Checking health status at opensearch-cluster-master.common:9200... Opensearch is healthy (green). OPENSEARCH VERIFICATION PASS ======================================================= Overall cluster status ======================================================= GREEN
Check the health of specific functions
Use the -t funtion-name option to
check the health of a specific function. For example, functions such as
check_node_cpu_memory_status,
check_etcd_logs,
check_registry_status,
check_replicas_status, and so on.
Sample Output
Use check_node_cpu_memory_status to check the CPU and
memory usage on all nodes.
root@primary1:~# health-check -t check_node_cpu_memory_status ====================================================== Check node cpu/memory usage ====================================================== OK primary1 OK primary2 OK primary3 OK worker1 OK
Use check_etcd_logs to check the etcd pod logs for disk
latency issues.
root@primary1:~# health-check -t check_etcd_logs ====================================================== Check etcd logs for disk latency ====================================================== OK kube-system/etcd-primary1 OK kube-system/etcd-primary2 OK kube-system/etcd-primary3 OK
Use check_replicas_status to check
Deployments/StatefulSets replica health.
root@primary1:~# health-check -t check_replicas_status ====================================================== Check deploy/statefulset replicas status ====================================================== OK
Enable verbose mode logging
Use the -v option to enable verbose mode logging while
checking the complete health of the cluster (health-check
-v) or while checking health of specific functions
(health-check -v -t
function-name). The verbose mode
displays additional information while logging the output.
Sample Output
root@primary1:~# health-check -v 2025-10-07 20:06:18 Health status checking in manual Mode ====================================================== Get node count of Kubernetes cluster ====================================================== OK There are 4 nodes in the cluster. ====================================================== Get node status of Kubernetes cluster ====================================================== OK 4 nodes are in the Ready state. NAME STATUS ROLES AGE VERSION primary1 Ready control-plane,etcd,master 335d v1.33.2+rke2r1 primary2 Ready control-plane,etcd,master 335d v1.33.2+rke2r1 primary3 Ready control-plane,etcd,master 335d v1.33.2+rke2r1 worker1 Ready worker 335d v1.33.2+rke2r1 ====================================================== Get node readiness status of Kubernetes cluster ====================================================== OK All 4 nodes are in a Ready state. ... <output snipped> ... ====================================================== Verifying Elasticsearch ====================================================== OK Opensearch test... Checking health status at opensearch-cluster-master.common:9200... Opensearch is healthy (green). OPENSEARCH VERIFICATION PASS ======================================================= Overall cluster status ======================================================= GREEN
Use check_node_cpu_memory_status to check the CPU and
memory usage on all nodes.
root@primary1:~# health-check -v -t check_node_cpu_memory_status ====================================================== Check node cpu/memory usage ====================================================== OK primary1 OK primary2 OK primary3 OK worker1 OK NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% primary1 6656m 42% 19565Mi 60% primary2 4224m 26% 18034Mi 56% primary3 5296m 33% 20480Mi 63% worker1 4192m 26% 10917Mi 34%
Check full cluster health
Use the -a option to check the complete health of the
cluster. The cluster health is checked for the default parameters and
also the following additional parameters:
-
OpenSearch shard status
-
Registry status
-
etcd logs
Sample Output
root@primary1:~# health-check -a 2025-10-07 19:10:09 Health status checking in manual Mode ====================================================== Get node count of Kubernetes cluster ====================================================== OK There are 4 nodes in the cluster. ... <output snipped> ... ====================================================== Check etcd logs for disk latency ====================================================== OK kube-system/etcd-ix-d-pg-pr1 OK kube-system/etcd-ix-d-pg-pr2 OK kube-system/etcd-ix-d-pg-pr3 OK ... <output snipped> ... ====================================================== Verifying Elasticsearch ====================================================== OK Opensearch test... Checking health status at opensearch-cluster-master.common:9200... Opensearch is healthy (green). OPENSEARCH VERIFICATION PASS ======================================================= Overall cluster status ======================================================= GREEN