CSO Disaster Recovery
In case of any failures you can recover CSO Release 6.2.0. To recover CSO Release 6.2.0 you must have already taken a backup and saved the backup file.
To recover CSO Release 6.2.0:
- Based on the hypervisor you are using, do one of the following:
If you are using KVM as the hypervisor:
-
Copy the CSO 6.2.0 backup folder to the bare metal server.
-
From the backup folder, copy the _topology.conf file to the Contrail_Service_Orchestration_6.2.0/topology/ folder.
For example:
cp /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/_topology.conf /root/Contrail_Service_Orchestration_6.2.0/topology/
Provision the VMs. For information on provisioning KVM hypervisor, see Provision VMs on Contrail Service Orchestration Servers in CSO Installation and Upgrade Guide.
-
Copy the backup folder file from the bare metal server to the startupserver1 VM.
user@server>scp -r /root/backups/backupfordr/ startupserver1:
-
Log in to the startupserver1 VM as the root user.
-
Expand the installer package.
root@startupserver1:~/# tar –xvzf Contrail_Service_Orchestration_6.2.0.tar.gz
The expanded package is a directory that has the same name as the installer package and contains the installation files.
-
From the backup folder, copy the _topology.conf file to the Contrail_Service_Orchestration_6.2.0/topology/ folder.
cp /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/_topology.conf /root/Contrail_Service_Orchestration_6.2.0/topology/
-
If you are using ESXi as the hypervisor:
-
Copy the backup folder to the startupserver1 VM.
-
Expand the installer package.
root@startupserver1:~/# tar –xvzf Contrail_Service_Orchestration_6.2.0.tar.gz
The expanded package is a directory that has the same name as the installer package and contains the installation files.
-
From the backup folder, copy the _topology.conf file to the Contrail_Service_Orchestration_6.2.0/topology/ folder in the startupserver1 VM.
For example:
cp /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/_topology.conf /root/Contrail_Service_Orchestration_6.2.0/topology/
-
- Run the deploy.sh command.
root@host:~/Contrail_Service_Orchestration_6.2.0./deploy.sh
-
Run the following command:
cso_backupnrestore -b backup -s backup62new
- Run the pre_disaster recovery script.
python /usr/local/bin/pre_disaster_recovery.py
Enter the old backup path: /root/backups/backupfordr/2020-10-29T06:45:11:45:11 Enter the new backup path: /backups/backup62new/2020-10-30T03:47:51 COMPONENTS: ('cassandra', 'elasticsearch', 'etcd', 'arangodb', 'icinga', 'swift', 'config_backups') Start cassandra pre restore task... Get old and new backup path for component cassandra cassandra pre restore task successfully done *Do you want to redeploy cassandra container to apply tokens. *This process will delete all the existing data from cassnadra Please enter yes to process [yes/no]:
Enter yes at the prompt.
Start elasticsearch pre restore task... Get old and new backup path for component elasticsearch Get Elasticsearch user id for permission Set permission for elasticsearch dir. elasticsearch pre restore task successfully done Start etcd pre restore task... Get old and new backup path for component etcd etcd pre restore task successfully done Start arangodb pre restore task... Get old and new backup path for component arangodb arangodb pre restore task successfully done Start mariadb pre restore task... Get old and new backup path for component mariadb mariadb pre restore task successfully done Start icinga pre restore task... Get old and new backup path for component icinga icinga pre restore task successfully done Start swift pre restore task... Get old and new backup path for component swift swift pre restore task successfully done Start config_backups pre restore task... config_backups pre restore task successfully done Pre restore task completed for all components.
-
Restore the data from the new backup created in step 3 by using the
cso_backupnrestore script.
#cso_backupnrestore -b restore -s backuppath -t '*' -c ‘cassandra' -r ‘yes’ #cso_backupnrestore -b restore -s backuppath -t '*' -c ‘elasticsearch' -r ‘yes’ #cso_backupnrestore -b restore -s backuppath -t '*' -c ‘arangodb' -r ‘yes’ #cso_backupnrestore -b restore -s backuppath -t '*' -c ‘icinga' -r ‘yes’ #cso_backupnrestore -b restore -s backuppath -t '*' -c ‘swift' -r ‘yes’ #cso_backupnrestore -b restore -s backuppath -t '*' -c ‘mariadb' -r ‘yes’
where
backuppath
is the new backup path.If the restore procedure fails for any of the above components, you must retry to restore only those components. At times, restore of mariadb fails at the first attempt but is successful at the second attempt.
- Synchronize the data between nodes.
cso_backupnrestore -b nodetool_repair
IF Cluster nodetool status is UP/Normal(UN) please proceed for nodetool repair (Y/n):
Enter y at the prompt.
- Copy the certificate from the backup folder to SDN-based
load balancing (SBLB) HA Proxy.
salt-cp -G "roles:haproxy_confd_sblb" /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/haproxycerts/minions/minions/csp-central-proxy_sblb1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.pem /etc/pki/tls/certs
salt-cp -G "roles:haproxy_confd_sblb" /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/haproxycerts/minions/minions/csp-central-proxy_sblb1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.crt /etc/pki/tls/certs
- Restart the SBLB HA Proxy.
salt -C "G@roles:haproxy_confd_sblb" cmd.run "service haproxy restart"
- Copy the certificate from the backup folder to Central
HA Proxy.
salt-cp -G "roles:haproxy_confd" /root/backups/backupfordr/2020-06-19T17:27:05/config_backups/haproxycerts/minions/minions/csp-central-proxy1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.pem /etc/pki/tls/certs
salt-cp -G "roles:haproxy_confd" /root/backups/backupfordr/2020-10-29T06:45:11/config_backups/haproxycerts/minions/minions/csp-central-proxy1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.crt /etc/pki/tls/certs
- Restart the Central HA Proxy.
salt -C "G@roles:haproxy_confd" cmd.run "service haproxy restart"
- Run the following commands on installer VM to update the
Nginx certificates.
kubectl get secret -n central | grep cso-ingress-tls cso-ingress-tls kubernetes.io/tls 2 17d kubectl delete secret cso-ingress-tls -n central kubectl create secret tls cso-ingress-tls --key /root/backups/backupfordr/2020-10-29T06:45:11/config_backups/haproxycerts/minions/minions/csp-central-proxy1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.key --cert /root/backups/backupfordr/2020-10-29T06:45:11/config_backups/haproxycerts/minions/minions/csp-central-proxy1.NH5XCS.central/files/etc/pki/tls/certs/ssl_cert.crt -n central
- Deploy microservices.
/python.sh micro_services/deploy_micro_services.py
- Reindex the elastic search.
Open the csp.csp-ems-regional deployment file.
kubectl edit deployment -n regional csp.csp-ems-regional
Change the replicas to 2 and increase the memory from 500Mi to 2048Mi (2Gi).
Save the file.
Start the reindex process.
cso_backuprestore -b reindex
-
Using the admin token, run the following API to build the policy indices:
curl --location --request POST 'https://AdminPortalIP/policy-mgmt/_index' \ --header 'x-auth-token: XXXXXXX‘\ --data-raw ‘'
- Create the RabbitMQ FMPM queue.
./python.sh upgrade/migration_scripts/common/rabbitmq_fmpm_queue_creation.py
- Load the data.
./python.sh micro_services/load_services_data.py
- Synchronize the Virtual Route Reflector (VRR). Use the
admin token. Do not use the cspadmin token.
-
Obtain the topo-uuid for the VRR.
GET: https://<IP Address>/topology-service/device
Synchronize the VRR using the POST https://<ip>/routing-manager/synchronize-vrr API.
{ "input": { "recover_vrr": true, "uuid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx” } }
-
- Restore the SD-WAN and security reports.
cso_backupnrestore -b restore -s backuppath -t '*' -c 'swift_report' -r 'yes'
where
backuppath
is the new backup path. -
Restart all fmpm-provider-api and fmpm-provider-core pods by deleting the
existing pods.
root@startupserver1:~# kubectl get pods -n central|grep fmpm-provider csp.csp-fmpm-provider-6644bc8b94-7pvfn 1/1 Running 0 9d csp.csp-fmpm-provider-6644bc8b94-c2psl 1/1 Running 0 9d csp.csp-fmpm-provider-6644bc8b94-gzkht 1/1 Running 1 9d csp.csp-fmpm-provider-6644bc8b94-hz8f5 1/1 Running 0 9d csp.csp-fmpm-provider-6644bc8b94-nsqfs 1/1 Running 0 9d csp.csp-fmpm-provider-6644bc8b94-rq9xq 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-7nm8q 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-7zj67 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-8njsq 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-rh2jr 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-sswbg 1/1 Running 0 9d csp.csp-fmpm-provider-core-797f7c48c9-zvhps 1/1 Running 0 9d
-
Delete all the pods displayed in the previous step.
kubectl delete pods csp.csp-fmpm-provider-6644bc8b94-7pvfn csp.csp-fmpm-provider-6644bc8b94-c2psl csp.csp-fmpm-provider-6644bc8b94-gzkht csp.csp-fmpm-provider-6644bc8b94-hz8f5csp.csp-fmpm-provider-6644bc8b94-nsqfs csp.csp-fmpm-provider-6644bc8b94-rq9xq csp.csp-fmpm-provider-core-797f7c48c9-7nm8q csp.csp-fmpm-provider-core-797f7c48c9-7zj67 csp.csp-fmpm-provider-core-797f7c48c9-8njsq csp.csp-fmpm-provider-core-797f7c48c9-rh2jr csp.csp-fmpm-provider-core-797f7c48c9-sswbg csp.csp-fmpm-provider-core-797f7c48c9-zvhps
- Restore the Contrail Analytics Node (CAN) database. Note:
You can restore the database only if a backup is available. CAN backup is disabled by default. To include CAN data in the backup, comment out
contrail_analytics
in the following configuration:root@startupserver1:~# cat /etc/salt/master.d/backup.conf backups: keep: 10 timeout: 1200 path: /backups enabled_roles: • cassandra • mariadb • kubemaster • elasticsearch # - redis • icinga • helm_manager # - contrail_analytics
To restore the CAN configuration database, run the following script:
./python.sh upgrade/migration_scripts/common/can_migration.py
To restore the CAN analytics database, perform the following steps:
The analyticsdb backup files are located at /backups/daily/2021-06-07T06:46:37/central/can/contrail_analytics<x>, where x indicates the contrail analytics node number. The value of x ranges from 1 through 3.
On all the three contrail analytics nodes:
-
Copy the CAN backup files from the startupserver to each CAN VM:
rsync -a<can-backup-files>root@<can-ip>:<created-backup-folder>
-
Run the following command on the CAN VMs:
docker cp 0000/ analytics_database_cassandra_1:/root
docker exec -it analytics_database_cassandra_1 bash mv /root/mc-* /var/lib/cassandra/data/ContrailAnalyticsCql/statstablev4-d5b63590a7f011eba080c3eb6817d254
#The path might be different based on uuid.
cd /var/lib/cassandra/data/ContrailAnalyticsCql/statstablev4-d5b63590a7f011eba080c3eb6817d254 chown -R cassandra:cassandra * nodetool -p 7200 refresh -- ContrailAnalyticsCql statstablev4
-
After a successful upgrade, CSO Release 6.2.0 is functional and you can log in to the Administrator Portal and the Customer Portal.