Backup and Restore Contrail Configuration Database

This document provides information on how to backup and restore the Contrail configuration databases—Cassandra and Zookeeper, for Contrail Networking deployed with Canonical Openstack through Juju Charms.

The backup and restore procedure must be completed for the nodes running the same Contrail Networking release. The procedure is used to backup the Contrail Networking databases only; it does not include instructions for backing up orchestration system databases.

CAUTION:

Database backups must be consistent across all systems because the state of the Contrail database is associated with other system databases, such as OpenStack databases. Database changes associated with northbound APIs must be stopped on all the systems before performing any backup operation. For example, you might block the external VIP for northbound APIs at the load balancer level, such as HAproxy.

The following procedure was tested with Juju version 2.7 and version 2.3.7 running on Ubuntu 16.04 LTS (Xenial Xerus).

Additionally, the procedure contains an example with Juju machine numbers—1, 2 and 3. You must replace it with your Juju machine numbers.You can identify your Juju machine numbers by running the following command on the host:

Backup config database

Follow the procedure to backup config database:

All the commands are run on the host where Juju client is installed, unless stated otherwise.

Note:

db_manage.py script is a disaster recovery script. If any errors occur after running this script, contact Juniper Networks support.

Update db_manage.py script.

Update db_json_exim.py script.

Latest versions of db_json_exim.py script requires python future library.

Stop Juju agents for contrail-controller application.

On each controller node, run juju-status command to confirm that agents are in the lost state.

Stop Contrail config services on all the nodes.

Verify status for contrail-controller node. It must be in the inactive state.

Check Contrail config DB for consistency on one of the controller nodes.

Synchronize the data by running repair command on the Contrail config DB.

Save database status. You may need it later to compare with the post procedure database status.

You can follow either one of the following methods:

Take backup by default db_json_exim.py script.

Take backup by db_json_exim.py script which you downloaded in the step 2.

Copy the database backup file from the container to the host.

Restart the Contrail config services on all the controller nodes.

On each controller node, run the contrail-status command to confirm that services are in the active or backup state.

Restart the Juju agents for contrail-controller application.

Run the juju status command from a machine where Juju client is configured. Confirm that Juju agents are in the active state.

Verify the db dump json file for logical structure. Make sure it’s not empty.
Node 1 contains db dump.
Verify the db dump file contains the correct configuration for UUIDs and VMs’ IP addresses for your environment.
Note:
If there are no VMs loaded on the environment, the above commands will not show any output.

Restore config database

Follow the procedure to restore config database:

Stop Juju agents for contrail-controller, contrail-analytics and contrail-analyticsdb applications.

Stop Contrail services on all the controller nodes.

Run the contrail-status command on each controller nodeto confirm that services are in the inactive state.

Take backup of the Zookeeper data directory on all the controllers.

Clean the current data from one of the Zookeeper instances using rmr command.

Stop Zookeeper services on all the controllers.

Clean the Zookeeper data directory contents from all the controllers.

Backup the Cassandra data directory from all the controllers.

Clean the Cassandra data directory contents from all the controllers.

After running the above commands, the old password is erased.

Modify Cassandra configuration on each controller, one at a time, to reset the password.
Edit the authenticator variable in the /etc/cassandra/cassandra.yaml file.
Replace authenticator: PasswordAuthenticator with authenticator: AllowAllAuthenticator.

Verify that no old Contrail services like db * scripts are running. If you find any old services, kill them.

Run the following command on Contrail nodes outside the docker containers.

Restart Contrail-Database and Zookeeper service on all the controllers.

Verify the status of Zookeeper service.

Verify the status of Cassandra service.

For details on nodetool status command, see https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsStatus.html.

Copy the config DB backup.

Restore config DB.
1. Prepare temporary contrail-api.conf file for db restoration.
2. Modify cassandra_password and cassandra_user in the contrail-api.conf file.
  juju ssh 1 sudo docker exec -it contrail-controller vim /tmp/contrail-api-dbrestore.conf
3. Import database from /tmp/db-dump/ db-dump.json file.
  You can follow any one of the following methods:
  - Import database by default db_json_exim.py script.
  - Import database by downloaded db_json_exim.py script.
If any error occurs, repeat the procedure to restore config database starting from step 5.

Synchronize the Cassandra data between nodes.

Modify Cassandra configuration on each controller, one at a time, to reset the password.
Edit the authenticator variable in the /etc/cassandra/cassandra.yaml file.
juju ssh <node> sudo docker exec contrail-controller systemctl restart contrail-database
Replace authenticator: AllowAllAuthenticator with authenticator: PasswordAuthenticator.

Create Contrail user on any of the controller nodes.

Verify if Contrail user is available on other controller nodes.

If you don’t see Contrail user created on these nodes, check replication factor for system_auth keyspace on all the controller nodes.

Check replication factor by one of the following methods:

Using nodetool command.
The output must show that each node owns 100% of tokens and partitions.

Querying Cassandra db.

The system_auth parameter must have replication_factor of 3.

If the replication_factor is not set to 3, run the following commands:

Restart Contrail services on all the controller nodes.

On each controller node, enter the contrail-status command to confirm that services are in the active or backup state.

Restart Juju agents for contrail-controller, contrail-analytics and contrail-analyticsdb applications.

Check Zookeeper status.

Check the log files on all the controller nodes for any errors.

Check the database using db_manage.py script.

ON THIS PAGE

Backup and Restore Contrail Configuration Database

Backup config database

Restore config database