Troubleshoot RabbitMQ Cluster Failure
Problem
The RabbitMQ cluster fails as a result of a power outage.
Solution
If
the
RabbitMQ
cluster
fails as a result of a power outage, the RabbitMQ message bus may not restart properly. To
identify
the status of the RabbitMQ message bus, run the kubectl get po -n
northstar -l app=rabbitmq command. This command should show three pods with their
status as Running. For
example:
$ kubectl get po -n northstar -l app=rabbitmq NAME READY STATUS RESTARTS AGE rabbitmq-0 1/1 Running 0 10m rabbitmq-1 1/1 Running 0 10m rabbitmq-2 1/1 Running 0 9m37s
If the status of one or more pods is Error, use the following recovery
procedure:
-
Delete RabbitMQ.
kubectl delete po -n northstar -l app=rabbitmq -
Check the status of the pods.
kubectl get po -n northstar -l app=rabbitmq.Repeat
kubectl delete po -n northstar -l app=rabbitmquntil the status of all pods isRunning. -
Restart the Paragon Pathfinder application.
kubectl rollout restart deploy -n northstar