Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Restoring Contrail Nodes in a RHOSP-based Environment

Contrail nodes are virtual machines hosted on a KVM hypervisor. In the Contrail RHOSP environment, the Contrail nodes are of three types– controller nodes, analytics nodes, and analytics database nodes. From time-to-time, the system may encounter a node crash or other node failure. This topic describes how to restore one or more failed Contrail controller nodes or analytics and analytics database nodes.

Use the following procedures to rebuild one or more Contrail nodes.

Prerequisites

Before attempting to rebuild one or more failed Contrail nodes, ensure that:

  • The system is stable and the node has the correct status to be deployed again.

  • The MAC address of the network interface card (NIC) used for PXE boot has not changed.

  • If you are restoring more than one node, make a backup of the Contrail databases and make sure you have access to the backup file (*.tar.bz2). For more information about backups, see Backing Up Contrail Databases Using JSON Format.

Verify the Controller Node Status and Rebuild the Node

This is the initial procedure for verifying the node is ready to be rebuilt.

  1. Check the node status. The failed node status should be listed as Power State power off and Maintenance False.

    In this example, contrail-controller02 is the node to be rebuilt, and its status is Power State power offand Maintenance False.The status indicates that contrail-controller02 is ready for rebuilding.

  2. In some cases, the power state of a failed node could be none. If the power state is none, you must set the power to off. To set the Power State power off:
  3. In some cases, the Maintenance mode could be set to True. If this is the case, you must set the Maintenance mode to False.
  4. Restore the node, and wait until its status turns to ACTIVE.
  5. Repeat steps 1-4 for each node that you want to restore.
  6. After restoration, finish rebuilding the node or nodes, using the next procedures.

Finish Rebuilding One or Two Contrail Controller Nodes

This procedure provides details to finish the rebuilding of the nodes when you are restoring one or two Contrail controller nodes.

  1. Verify the status of the rebuild process. Wait until the status turns ACTIVE.
  2. Establish an SSH connection to the node you have rebuilt and observe the journal of the os-collect-config process until you see multiple occurrences of the message No local metadata found (['/var/lib/os-collect-config/local-data']).

    If you want to rebuild two controller nodes, repeat this step for the other node before moving to the next step.

  3. Perform a full stack update to reconverge the stack and bring the system back to operational state.

Finish the Rebuilding of all Contrail Controller Nodes

This procedure provides details to finish the rebuilding of the nodes when you are restoring all of the Contrail controller nodes.

  1. Observe the status of the rebuild process. The status of nodes will display REBUILD while the rebuilding process is occurring.
  2. Wait until the status of all nodes changes to ACTIVE.
  3. Establish an SSH connection to each of the nodes that have been rebuilt and observe the journal of the os-collect-config process until you see multiple occurrences of the string No local metadata found (['/var/lib/os-collect-config/local-data']).
  4. Retrieve the Contrail controller databases backup.tar.bz2 and put it into your Director Control Zone, and perform a database restore. For more information about backups, see Backing Up Contrail Databases Using JSON Format
  5. Verify that the Contrail services are running.
  6. Perform a full stack update to reconverge the stack and bring the system back to operational state.

Rebuilding Contrail Analytics And Analytics Database Nodes

This topic describes how to rebuild failed Contrail analytics and analytics database nodes. The same procedure is used for analytics nodes and analytics database nodes.

To rebuild Contrail analytics and analytics database nodes:

  1. Verify that the failed node is ready to be redeployed. To be ready, the failed node must have a Power State power off and Maintenance False.
  2. In some cases, the power state might be None. If the power state is None, set it to off.
  3. In some cases, the Maintenance mode might be set to True. If maintenance mode is True, set it to False.
  4. Rebuild the node, and wait for the node status to turn ACTIVE. Repeat the procedure for each node you need to replace.

Finish Rebuilding the Analytics Nodes

Use this procedure to finish the rebuilding of the Contrail analytics or analytics database nodes that have been rebuilt.

  1. Observe the status of the rebuild process. Nodes undergoing rebuilding will have Status of REBUILD. Wait until the status of all nodes being rebuilt turns ACTIVE.
  2. Activate os-collect-config on the node.
  3. Establish an SSH connection to the node you have rebuilt and observe the journal of the os-collect-config process until you see multiple occurrences of the message No local metadata found (['/var/lib/os-collect-config/local-data']). Repeat this step for each node being rebuilt.
  4. Perform a full stack update to reconverge the stack and bring the system back to operational state.