Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Upgrade Instructions

 

Instructions for upgrading your Contrail Cloud to the specified release.

Upgrade Your Contrail Cloud to the Release

Note

Juniper supports a n+1 upgrade path for releases. This procedure remains unchanged and supports upgrade for 13.2.x to 13.3, and supports upgrade from 13.3 to 13.4.

This is an in place upgrade as defined by RHOSP TripleO model. You now have the option to run a parallel update of roles to complete this upgrade. You must follow a reboot process following the upgrade if the nodes were not rebooted automatically.

No deployment configurations are required when updating. If deployment configuration changes must be made for any reason, they must be applied to your existing Contrail Cloud deployment before upgrading to the current version. As a best practice, it is always good to review your configuration files to make sure they adhere to a proper schema and the needs of your deployment environment.

The Contrail Cloud upgrade procedure allows for fine-grained control of the upgrade process. This control of the upgrade process is expressed through configurations in the update plan in config/site.yml.

Before You Upgrade

Take these initial steps before starting your Contrail Cloud Upgrade. This will help eliminate possible errors that might occur during the upgrade process and will help ensure expected results. The sections below are a prerequisite to the upgrade of your Contrail Cloud.

Review Your Configuration Files

At this point you want to review your current setup to ensure all configuration settings are accurate and reflect a desired deployment for your Contrail Cloud environment.

  • Review all the YAML files in the /var/lib/contrail_cloud/config directory and ensure all values match your expected results.

    Compare the old configs against the new Contrail Cloud config schema to check for gaps. To check that the configs are compatible, run:

Verify Undercloud/Overcloud Health and Service Operations

It is vital that you always check the health of your cloud and the services running in your cloud before attempting any deployment or upgrade activities. You must ensure that the undercloud/overcloud is fully functional, healthy, and that all services are active. Any problems in your cloud health may cause errors during upgrading. Incorrect settings and configurations will carry over to the upgraded Contrail Cloud deployment.

  1. Check the health of the undercloud, overcloud and the nodes running on them. To verify the health of your cloud and the services, see Node Reboot and Health Check and refer to the “Verify Quorum and Node Health section” in the document.

Back Up Your Undercloud and Overcloud

Make sure to back up your undercloud and overcloud before running the update script. For complete instructions to back up your cloud, see BACK UP AND RESTORE THE DIRECTOR UNDERCLOUD, Backing up the overcloud control plane services, and Backing up Contrail Databases in JSON Format.

Pause and Shutdown Business Services

You must pause or shutdown external business services at this time to ensure a smooth upgrade while preventing possible data loss or workload errors. These business services can include the scope of anything outside of the Contrail Cloud deployment but interacts with Contrail Cloud as a whole. The steps to complete the tasks below are dependent on the specific business service/VM that is running. Please consult the documentation for the specific service you need to pause/shutdown.

  • Quiesce all external API requests, for example, Horizon.

  • Gracefully shutdown any vulnerable workloads.

  • You will want to consider migrating your services/VMs to a different cloud that is outside of the upgrade environment.

Start the Contrail Cloud Upgrade

Time to upgrade your Contrail Cloud Release. The process will deliver updated containers, Red Hat RHEL/RHOSP/Storage content, and kernel version that are associated with the chosen Release.

The procedure below will guide you through the update. There is a small disruption in service during the update. However, the update preserves existing overcloud configurations. For example: images, projects, networks, volumes, virtual machines, and so on.

Retrieve Adjusted Keys and Install

Follow these steps to start your upgrade:

  1. Send an e-mail message to contrail_cloud_subscriptions@juniper.net and request a Contrail Cloud upgrade. Provide the following information:
    • Include your current activation key in the email request. Your Contrail Cloud activation key will be adjusted to the requested version.

    • Specify the time and date you would like to upgrade your Contrail Cloud. The Contrail Cloud team will prepare the activation for your maintenance window.

  2. Refresh your Contrail Cloud subscription on the jump host server by running the contrail_cloud_installer.sh from the jump host with the arguments:

Upgrade Contrail Cloud

The following procedure and scripts will upgrade your Contrail Cloud.

As the “contrail” user (su - contrail from root), execute the following scripts on the jump host to perform the update:

  1. Upgrade the jump host and the undercloud VM.

    This will:

    • Update the packages and containers on the jump host and the undercloud VM.

    • Update Red Hat OpenStack Platform Director on the undercloud VM.

    • Update image on the undercloud VM used to provision all new overcloud role instances.

      • overcloud-image-full is updated and used to provision any new overcloud role instance.

  2. Prepare the overcloud for upgrade:

    This will:

    • Publish new containers to the registry on the undercloud VM.

    • Update the overcloud plan on the undercloud VM.

    • Prepare the overcloud nodes for update: openstack overcloud update prepare.

  3. Perform the overcloud upgrade.

    The overcloud upgrade (contrail-cloud-update-overcloud-step2.sh) will:

    • Upgrade all nodes as defined in config/site.yml using nodes_list.

    • Upgrade packages and containers for each node.

    • Upgrade one node batch per script run.

    • Automatically create a lockfile when the batch has been processed.

    • Reboot the compute nodes, unless manually disabled.

    There are different methods that can be used to complete the overcloud upgrade step. The different methods are listed below (choose one):

    • Default method. All nodes will upgrade in one run.

      • All roles are upgraded one by one.

      • Within each role the nodes are upgraded one by one.

    • Targeted method. You have the ability to target roles and even nodes to control the upgrade sequence.

      • Ability to set the desired upgrade targets in the configs/site.yml file.

      • Typical to upgrade all control plane roles together with this method.

      • Computes can be upgraded in small targeted groups.

    If you encounter failures while running the contrail-cloud-update-overcloud-step2.sh script, see If an Upgrade Fails in the sections below.

    Note

    The overcloud upgrade script contrail-cloud-update-overcloud-step2.sh has a hard timeout of 4 hours, which may not be sufficient for complex deployments. Consider using targeted updates to allow for incremental role upgrades which can complete within that timeframe.

    Default Method

    To upgrade all the nodes using the default method, run the script below. This will upgrade all nodes in one run and require no additional steps. The update will apply to all roles one at a time and one node at a time within each role.

    Targeted Method

    The procedure below allows you to target specific roles and nodes during the update. This approach allows for control and predictability of the update and subsequent compute node reboots. This method is desirable if you want to target specific resources to be updated as workloads are migrated. The roles can now be updated in parallel and the nodes within each role can be updated sequentially.

    To complete a targeted update, copy and paste the sample plan samples/features/update-contrail-cloud/site.yml into your config.site.yml. Edit the sample plan to match your deployment for each targeted group and run the update script for each batch defined within the update plan. The step2 script is run multiple times. You will run step2 once for each defined batch in the update plan. Per-node control allows for planning around node reboot. For how to reboot your compute nodes, see Node Reboot and Health Check and refer to the node reboot section.

    Note

    Compute nodes will automatically reboot as part of the upgrade process, unless manually disabled. Select “disabled” for reboot_computes: to stop the automated reboots. You will have to follow the manual reboot procedure after the upgrade is complete for updated packages to take effect (e.g., kernel updates).

    • Configure the update plan in your config/site.yml. Define how many nodes from each role will be updated at the same time:
    • Configure the update plan for the desired reboot behavior:
    • Now define your batches.

      This is where you define your series of batches. You define the roles and nodes that belong to each unique batch. Other batch update characteristics are set here as well. When the update script is run, the script will identify the first unique batch which has not already been executed and updated. A lockfile is created after each successful batch update to identify it as being completed.

      Name the unique batch and configure the update type with a value of either parallel or sequence. The update will be performed in the batch order you configure in your site.yml file. You can also target specific nodes you want to update (e.g. computes) by including the node name. To start, you might configure it to look like this:

    • Set the node types in nodes_list. This list belongs to the unique batch name defined above. In this example, this would be all the node types associated with the batch named controller_nodes:
    • Define the specific nodes that are unique within the named node role. Below is an example of defining both storage and compute nodes:
    • Run the update script after you have set your variables for each defined batch in the update plan. Rerun the update script until all batches have successfully updated:
  4. Converge the overcloud upgrade. The script below will update Ceph and converges the overcloud heat stack. Note, the overcloud[‘deployment_timeout’] value in the config/site.yml can be increased to avoid timeouts in the Ceph upgrade.

    This will:

    • Ensure that the stack resource structure aligns with the new packages and configurations.

    • Update the Ceph cluster configuration: openstack overcloud ceph-upgrade run.

    • Run update converge: openstack overcloud update converge.

    • Finalize the overcloud update.

Move on to the next sections to upgrade AppFormix and Contrail Command.

Upgrade AppFormix

Upgrade AppFormix for use with Contrail Cloud.

  1. As the “contrail” user (su - contrail from root), execute the following script on the jump host to perform the update:

    This will:

    • Upgrade all packages and containers on the AppFormix nodes.

  2. Verify the status of AppFormix.

    Run the following command to view the status AppFormix:

    This will return a 200 on success. Any other code returned should be considered a failure. The API output also contains the AppFormix version. This is helpful to verify the correct version has been installed. See the sample below:

Upgrade Contrail Command

Upgrade Contrail Command for use with Contrail Cloud.

  1. As the “contrail” user (su - contrail from root), execute the following script on the jump host to perform the update:

    This will:

    • Upgrade all packages and containers in the Contrail Command VM.

  2. Login to the Contrail Command web UI to verify that it was successfully installed. You access Contrail Command by entering https://<jumphost>:9091 in your browser.

    Review the /var/lib/contrail_cloud/config/vault-data.yml for Contrail Command authentication details.

If an Upgrade Fails

If at any point your upgrade fails you will need to troubleshoot. Follow these basic steps for failure analysis:

  • Review the failure output and take screenshots. The screenshots will help others review your failure.

  • Review your configuration files. There could be mistakes in your YAML configuration files. Some common configuration errors include (but not limited to): NIC setup, role assignment, network assignment, and networking related errors.

  • Gather information to help troubleshoot the problem. One common troubleshooting step is to retrieve the log from a failed node. You do this by ssh to the node and check /var/log/messages. Use the following sequence of CLI commands:

    1. Log in to the jump host as the root user.
    2. su - contrail
    3. ssh undercloud
    4. source stackrc
    5. Run openstack stack failures list overcloud to identify any stack failures to help identify which roles are having issues.

      Nothing will return in the CLI if there are no failures to report.

    6. nova list
    7. ssh <address>. Use the list generated in step 6 to identify the node you need to ssh to.
    8. sudo vi /var/log/messages from within the selected node.
  • You must bring all services back to health for the failure to be considered corrected.

    Restore the Pacemaker cluster that was stopped as a result of the failed step in the upgrade procedure (pcs cluster start on the controller nodes that have it stopped) to bring the cluster back to healthy state.

    Re-run the failed script only when the failure has been corrected and Pacemaker has been started with the cluster healthy again. Move forward with the upgrade procedure only after the failed playbook runs successfully.

You can safely move on to reboot your nodes if you received no failures during the upgrade process.

Remove Duplicate vRouters

It is possible that duplicate instances of the vRouter might occur during the upgrade process, and it is necessary to remove these duplicates. Access the GUI at this point to identify and remove any duplicate vRouters before continuing with the upgrade process.

Reboot Your Nodes

A Contrail Cloud update will introduce a new RHEL image and kernel. You will now need to reboot your nodes if you chose to disable automatic reboots. You will also need to reboot the control plane, control hosts, and storage at this time. Reboot your nodes as described in, Node Reboot and Health Check.

Upgrade from Contrail Cloud Release 13.1 to 13.2

This is an in place upgrade as defined by Red Hat. You will have to upgrade role-by-role and host-by-host to complete this upgrade. You must follow a reboot process following the upgrade.

There are no changes in the configuration YAML files between Contrail Cloud 13.1 and 13.2. Therefore, You don't need configuration changes between Contrail Cloud 13.1 to 13.2. If configuration changes must be made for any reason, they must be applied to your existing Contrail Cloud 13.1 deployment before upgrading to Version 13.2. As a best practice, it is always good to review your configuration files to make sure they adhere to a proper schema and the needs of your deployment environment.

Before You Upgrade

Take these initial steps before starting your Contrail Cloud Upgrade. This will help eliminate possible errors that might occur during the upgrade process and will help ensure expected results. The sections below are a prerequisite to the upgrade of your Contrail Cloud.

Review Your Configuration Files

At this point you want to review your current setup to ensure all configuration settings are accurate and reflect a desired deployment for your Contrail Cloud environment.

  • Review all the YAML files in the /var/lib/contrail_cloud/config directory and ensure all values match your expected results.

Verify Undercloud/Overcloud Health and Service Operations

It is vital that you always check the health of your cloud and the services running in your cloud before attempting any deployment or upgrade activities. You must ensure that the undercloud/overcloud is fully functional, healthy, and that all services are active. Any problems in your cloud health may cause errors during upgrading. Incorrect settings and configurations will carry over to the Contrail Cloud 13.2 deployment.

  1. Check the health of the undercloud, overcloud and the nodes running on them. To verify the health of your cloud and the services, see Node Reboot and Health Check and refer to the “Verify Quorum and Node Health section” in the document.

Back Up Your Undercloud and Overcloud

Make sure to back up your undercloud and overcloud before running the update script. For complete instructions to back up your cloud, see BACK UP AND RESTORE THE DIRECTOR UNDERCLOUD, Backing up the overcloud control plane services, and Backing up Contrail Databases in JSON Format.

Pause and Shutdown Business Services

You must pause or shutdown external business services at this time to ensure a smooth upgrade while preventing possible data loss or workload errors. These business services can include the scope of anything outside of the Contrail Cloud deployment but interacts with Contrail Cloud as a whole. The steps to complete the tasks below are dependent on the specific business service/VM that is running. Please consult the documentation for the specific service you need to pause/shutdown.

  • Quiesce all external API requests, for example, Horizon.

  • Gracefully shutdown any vulnerable workloads.

  • You will want to consider migrating your services/VMs to a different cloud that is outside of the upgrade environment.

Start the Upgrade from Contrail Cloud Release 13.1 to 13.2

Time to upgrade to Contrail Cloud Release 13.2. Contrail Cloud 13.2 will deliver updated containers, RHEL image and kernel version that are associated with Release 13.2.

The procedure below will guide you through the update. There is a small disruption in service during the update. However, the update preserves existing overcloud configurations. For example: images, projects, networks, volumes, virtual machines, and so on.

Retrieve Adjusted Keys and Install

Follow these steps to start your upgrade:

  1. Send an e-mail message to contrail_cloud_subscriptions@juniper.net to request Contrail Cloud 13.2. Provide the following information:
    • Include your current activation key in the email request. Your Contrail Cloud activation key will be adjusted to Version 13.2.

    • Specify the time and date you would like to upgrade your Contrail Cloud version. The Contrail Cloud team will prepare the activation for your maintenance window.

  2. Refresh your Contrail Cloud subscription on the jump host server by running the contrail_cloud_installer.sh from the jump host with the arguments:
  3. Ensure that all overcloud nodes have valid subscription-manager registrations.

Upgrade to Contrail Cloud 13.2

The following procedure and scripts will upgrade your Contrail Cloud to Version 13.2.

As the “contrail” user (su - contrail from root), execute the following scripts on the jump host to perform the update:

  1. Upgrade the jump host and the undercloud VM.
  2. Prepare the overcloud for upgrade:
  3. Perform the overcloud upgrade.

    There are two different methods that can be used to complete this step. The different methods are listed below (choose one):

    • Default method. All nodes will upgrade in one run.

      • All roles are upgraded one by one.

      • Within each role the nodes are upgraded one by one.

    • Targeted method. You have the ability to target roles and even nodes to control the upgrade sequence.

      • Ability to set the desired upgrade targets in the configs/site.yml file.

      • Typical to upgrade all control plane roles together with this method.

      • Computes can be upgraded in small targeted groups.

    If you encounter failures while running the contrail-cloud-upgrade-overcloud-step2.sh script, see If an Upgrade Fails in the sections below.

    Note

    The overcloud upgrade script contrail-cloud-upgrade-overcloud-step2.sh has a hard timeout of 4 hours, which may not be sufficient for complex deployments. Consider using targeted updates to allow for incremental role upgrades which can complete within that timeframe.

    Default Method

    To upgrade all the nodes using the default method, run the script below. This will upgrade all nodes in one run and require no additional steps. The update will apply to all roles one at a time and one node at a time within each role.

    Targeted Method

    The procedure below allows you to target specific roles and nodes during the upgrade. This approach allows for control and predictability of the upgrade and subsequent compute node reboots. This method is desirable if you want to update the control plane roles at one time, and then target specific compute resources to be updated as workloads are migrated.

    To complete a targeted update, just edit your /config/site.yml for each targeted group and rerun the update script each time a change is made. This process can be rerun multiple times if necessary. You can use the name of a specific node, or the name of a specific role to upgrade. Just remember to change your /config/site.yml with each update. Per-node control allows for planning around node reboot. It may be desirable to reboot compute nodes as they are updated to avoid disruption later. For how to reboot your compute nodes, see Node Reboot and Health Check and refer to the node reboot section.

    Note

    Compute nodes may automatically reboot as part of the upgrade process.

    • You need to configure your /config/site.yml to reflect the nodes you want upgraded. The upgrade will be performed in the order you configure in the site.yml file. To start, you might configure it to look like this:
    • Run the update script after you have set your variables:
    • You can now edit your /config/site.yml to target the specific nodes you want to update (e.g. computes). Replace the role names with the node names you want to update. Below is an example targeting specific compute nodes to be upgraded:

      Run the update script after all variables have been set:

    You need to create a flag file to mark contrail-cloud-upgrade-overcloud-step2.sh as compete once all overcloud nodes have been upgraded. The flag file is required before running the next upgrade script. Run the following command:

  4. Converge the overcloud upgrade. The script below will update Ceph and converges the overcloud heat stack. Note, the overcloud[‘deployment_timeout’] value in the config/site.yml can be increased to avoid timeouts in the Ceph upgrade.

Move on to the next sections to upgrade your AppFormix and Contrail Command for Contrail Cloud 13.2.

Upgrade AppFormix

Upgrade to the latest version of AppFormix for use with Contrail Cloud 13.2.

  1. As the “contrail” user (su - contrail from root), execute the following script on the jump host to perform the update:
  2. Verify the status of AppFormix.

    Run the following command to view the status AppFormix:

    This will return a 200 on success. Any other code returned should be considered a failure. The API output also contains the AppFormix version. This is helpful to verify the correct version has been installed. See the sample below:

Upgrade Contrail Command

Upgrade to the latest version of Contrail Command for use with Contrail Cloud 13.2.

  1. As the “contrail” user (su - contrail from root), execute the following script on the jump host to perform the update:
  2. Login to the Contrail Command web UI to verify that it was successfully installed. You access Contrail Command by entering https://<jumphost>:9091 in your browser.

    Review the /var/lib/contrail_cloud/config/vault-data.yml for Contrail Command authentication details.

If an Upgrade Fails

If at any point your upgrade fails you will need to troubleshoot. Follow these basic steps for failure analysis:

  • Review the failure output and take screenshots. The screenshots will help others review your failure.

  • Review your configuration files. There could be mistakes in your YAML configuration files. Some common configuration errors include (but not limited to): NIC setup, role assignment, and networking related errors.

  • Gather information to help troubleshoot the problem. One common troubleshooting step is to retrieve the log from a failed node. You do this by ssh to the node and check /var/log/messages. Use the following sequence of CLI commands:

    1. Log in to the jump host as the root user.
    2. su - contrail
    3. ssh undercloud
    4. source stackrc
    5. Run openstack stack failures list overcloud to identify any stack failures to help identify which roles are having issues.

      Nothing will return in the CLI if there are no failures to report.

    6. nova list
    7. ssh <address>. Use the list generated in step 6 to identify the node you need to ssh to.
    8. sudo vi /var/log/messages from within the selected node.
  • You must bring all services back to health for the failure to be considered corrected.

    Restore the Pacemaker cluster that was stopped as a result of the failed step in the upgrade procedure (pcs cluster start on the controller nodes that have it stopped) to bring the cluster back to healthy state.

    Re-run the failed script only when the failure has been corrected and Pacemaker has been started with the cluster healthy again. Move forward with the upgrade procedure only after the failed playbook runs successfully.

You can safely move on to reboot your nodes if you received no failures during the upgrade process.

Remove Duplicate vRouters

It is possible that duplicate instances of the vRouter might occur during the upgrade process, and it is necessary to remove these duplicates. Access the GUI at this point to identify and remove any duplicate vRouters before continuing with the upgrade process.

Reboot Your Nodes

Contrail Cloud 13.2 introduces a new RHEL image and kernel. You need to reboot the nodes as described in, Node Reboot and Health Check.

Upgrade from Contrail Cloud Release 13.02 to 13.1

Contrail Cloud 13.1 does not support upgrade from earlier releases. You must redeploy using adjusted activation keys and retrieve new software packages from the Contrail Cloud Satellite.

  1. Send a request to contrail_cloud_subscriptions@juniper.net regarding the adjustment of your Contrail Cloud keys to Version 13.1.
  2. Redeploy Contrail Cloud using the adjusted activation keys.

    For more information, see Deploying Contrail Cloud.

Upgrade from Contrail Cloud Release 13.0.1 to 13.0.2

Upgrade to Contrail Cloud Release 13.0.2 to apply the updated containers that are delivered with Contrail Networking 5.0.2. This update restarts each instance of overcloud roles, one-by-one, so there is a small disruption in service during the update. However, the update preserves existing overcloud configurations. For example: images, projects, networks, volumes, virtual machines, and so on.

To update Contrail Cloud to 13.0.2:

  1. Ensure that the overcloud is fully functional and that all services are active.
  2. Review the config/site.yml.
    1. Remove any overcloud.registry configuration

    2. Validate that the control host storage allocations use defined storage pools. If the defaults were not used then it might be necessary to adjust the control-host configuration.

  3. Review the config/overcloud-nics.yml, config/control-host-nodes.yml, and config/appformix-nodes.yml to rename all instances of ControlInterfaceDefaultRoute to ControlPlaneDefaultRoute.
  4. Send an e-mail message to contrail_cloud_subscriptions@juniper.net to coordinate the deployment activation key from Contrail Cloud 13.0.1 to Contrail Cloud 13.0.2. An update script cc-update.sh is then provided.
  5. Download the cc-update.sh script to /var/lib/contrail_cloud/scripts/cc-upgrade.sh on the jumphost. Make this file executable:
  6. As the “Contrail” user, execute the following script on the jumphost to perform the update:/var/lib/contrail_cloud/scripts/cc-upgrade.sh.

Workaround for DPDK Compute Nodes

The update script does not update the contrail-vrouter-agent-dpdk container on the DPDK compute nodes.

Use the instructions below to update the Contrail Cloud 13.0.2 DPDK compute nodes:

  1. For each DPDK compute node, update /etc/sysconfig/network-scripts/network-functions-vrouter-dpdk-env to the following:
  2. Restart the vhost0 interface for the changes to take effect.

Workaround for Kernel vRouter Compute Nodes

The update script does not update the contrail-vrouter-kernel-init container on the kernel compute nodes.

Use the instructions below to update the Contrail Cloud 13.0.2 kernel vRouter compute nodes:

  1. For each kernel vRouter compute node, pull the latest Docker image:
  2. Find the docker image ID:
  3. Run the init container:
  4. Restart the vRouter agent and vhost0 interface:
  5. Reboot to apply the updates: