Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Troubleshooting Upgrade-Related Errors

 

This topic describes the possible errors that you might encounter while you are upgrading Contrail Service Orchestrator (CSO).

Salt Synchronization Error

Problem

Description: While you are upgrading CSO to Release 3.3 or reverting to the previously-installed release, the upgrade or revert status is displayed as Going to sync salt... for a considerable time.

The Salt Master on the installer VM might be unable to reach all Salt Minions on the other VMs, and the salt timeout exception might occur.

Solution

Based on the output of the salt ‘*’ test.ping command, you must either restart the Salt Master or the Salt Minion.

To resolve the error:

  1. Open another instance of installer VM.
  2. Run the salt ‘*’ test.ping command, to check if the Salt Master on the installer VM is able to reach other VMs.
    root@host:~/# salt ‘*’ test.ping
    • If the following error occurs, you must restart the Salt Master.

      Salt request timed out. The master is not responding. If this error persists after verifying the master is up,worker_threads may need to be increased

      root@host:~/# service salt-master restart
    • If there are no errors, view the output.

      root@host:~/# salt ‘*’ test.ping

      If the status of a VM is False, you must login to the VM, and restart the Salt Minion.

      root@host:~/csp-central-infravm.8V1O2D.central# service salt-minion restart
  3. Rerun the salt ‘*’ test.ping command to verify if the status for all VMs is True.

Cache Clearance Error

Problem

Description: While you are upgrading CSO to Release 3.3, the following error might occur:

Could not free cache on host server ServerName

Solution

You must clear the cache on the host server.

To resolve the error:

  1. Log in to the host server through SSH.
  2. To clear the cache, run the following command:
    root@host:~/Contrail_Service_Orchestration_3.3# free && sync && echo 3 > /proc/sys/vm/drop_caches && free

    The following output is displayed:

The cache is cleared on the host server.

Kube-system Pod Error

Problem

Description: While you are upgrading CSO to Release 3.3, the following error might occur:

One or more kube-system pods are not running

Solution

Check the status of the kube-system pod, and restart kube-proxy if required.

To resolve the error:

  1. Log in to the central or regional microservices VM through SSH.
  2. To view the status of the kube-system pod, run the following command:
    root@host:~/# kubectl get pods –namespace=kube-system

    The following output is displayed:

    Check the status of kube-proxy. You must restart kube-proxy if the status is Error, Crashloopback, or MatchNodeSelector.

  3. To restart kube-proxy, run the following command.
    root@host:~/# Kubectl apply –f /etc/kubernetes/manifests/kube-proxy.yaml

The kube-system pod-related error is resolved.

Kubernetes Node Error

Problem

Description: While you are upgrading CSO to Release 3.3, the following error might occur:

One or more nodes down

Solution

Check the status of kube-master or kube-minion and restart the nodes, if required.

To resolve the issue:

  1. Log in to the central or regional microservices VM through SSH.
  2. Run the following command to check the status of each node:
    root@host:~/# kubectl get nodes

    Identify the node that is in the Not Ready status. You must restart the node if the status is Not Ready.

  3. To restart the node that is in the Not Ready status, log in to the node through SSH and run the following command:
    root@host:~/# service kubelet restart
  4. Rerun the following command to check the status of the node that you restarted.
    root@host:~/# kubectl get nodes

The kubernetes node-related error is resolved.

Snapshot Error

Problem

Description: The upgrade.sh script, sets CSO to maintenance mode, takes a snapshot of all VMs so that you can roll back to the previous release if the upgrade fails. While you are upgrading to CSO Release 3.3, the snapshot process might fail because of the following reasons:

  • Unable to shutdown one or more VMs—You must manually shutdown the VM.

  • Unable to take a snapshot for one ore more VMs—You must manually restart the VMs, start kubernetes pods, and set CSO to active mode.

Solution

  • To manually shutdown the VMs:

    1. Log in to the CSO node or server as root.
    2. Execute the following command to view the list of VMs.
      root@host:~/# virsh list --all

      The VMs are listed as follows:

      Identify the VMs that are in running state.

    3. Execute the following command to shutdown the VMs that are in running state:
      root@host:~/# virsh shutdown VMName

    If you want to proceed with the upgrade process, you can rerun the upgrade.sh script.

  • If you are unable to take the snapshot for one or more VMs, you must:

    1. Log in to the CSO node or server as root.
    2. Execute the following command to view the list of VMs.
      root@host:~/# virsh list --all

      The VMs are listed as follows:

      Identify the VMs that are in shut off state.

    3. Execute the following command to restart the VMs that are in shut off state.
      root@host:~/# virsh start VMName

      You must restart the VMs in the following order:

      1. Infrastructure VM

      2. Load balancer VM

      3. Southbound Load balancer VM

      4. Contrail Analytics VM

      5. K8 Master VM

      6. Microservices VM

    4. On the installer VM, run the following commands to start the kubernetes pod:
      1. Execute the following command to check the status of clear_cache_pods.

        root@host:~/# cat upgrade/upgrade.conf | grep clear_cache_pods
      2. If the status of clear_cache_pods is successful, execute the following command to start the kubernetes pod.

        root@host:~/# ./python.sh vm_snapshot/scale_pods.py revert
    5. Log in to the central infrastructure VM through SSH , and run the following command to set CSO to active mode.
      root@host:~/# etcdctl set /lb/maintenance false

If you want to proceed with the upgrade process, you can rerun the upgrade.sh script.