Recovering CSO Services
Recovering a CSO Service
If the components_health.sh script detects any service as unhealthy, then you can recover the service using the recovery.sh script.
The recovery script starts the recovery process for the specified component (saltstack in this example). The following is a sample output of the messages that are displayed. A recovery completion message is displayed after the component is recovered.
INFO Started recovering saltstack component at 2021-07-22 03:00:08.989767 ... INFO Saltstack failure recovery is initiated... INFO Saltstack check() INFO Salt Master is running INFO Deleting unreachable minion key csp-central-proxy_sblb2.N6RGW8.central INFO Deleting unreachable minion key csp-central-k8-microservices3.N6RGW8.central INFO Deleting unreachable minion key csp-central-k8-microservices2.N6RGW8.central INFO Deleting unreachable minion key csp-central-k8-infra3.N6RGW8.central INFO Completed recovering saltstack component at 2021-07-22 03:00:27.816847 . INFO Time taken to recover 0:00:18.827080
If the recovery.sh script detects an issue with the k8 virtual machine when recovering the kubernetes component, then it displays an error message as shown in the following sample output:
Kubernetes recovery failed, Please refer logs/recovery.log for more details Failed to recover Kubernetes Please run replace_vm for k8-master1
You can run the deploy.sh script to replace the k8 virtual machine.
Replacing Virtual Machines for KVM Hypervisor
You can replace only a k8 virtual machine. To replace a k8 virtual machine:
If a power cycle occurs on the physical servers, the vrr becomes unhealthy.
The following is a sample output that shows the vrr status as unhealthy:
INFO Health Check for Infrastructure Component Vrr Started INFO Attempt: 1 - Retrying Health Check for Component Vrr INFO Attempt: 2 - Retrying Health Check for Component Vrr ERROR The Infra Component : Vrr is Unhealthy
Run the recovery.sh script again to bring the vrr back online. The following is a sample output of the recovery process:
root@startupserver1:~/Contrail_Service_Orchestration_6.1.0# ./recovery.sh *************** This tool assists you recover your CSO setup. *************** Following components can be recovered 1: contrailanalytics 2: cassandra 3: mariadb 4: etcd 5: kubernetes 6: vrr 7: saltstack 8: arangodb 9: microservices 10: icinga 11: rabbitmq Specify one of the component to recover (In Number) : 6 INFO Started recovering vrr component at 2021-07-22 07:38:23.504490 ... INFO VRR recovery is initiated... INFO Vrr - 192.168.10.29 is healthy ERROR Vrr - 192.168.10.30 is unhealthy INFO Recovery takes time, please be patient INFO VRR recovery started. Please wait... INFO Vrr console recovered for vrr2 INFO VRR config sync completed successfully INFO Completed recovering vrr component at 2021-07-22 07:48:31.324598 . INFO Time taken to recover 0:10:07.820108