Troubleshooting Upgrade-Related Errors
This topic describes the possible errors that you might encounter while you are upgrading Contrail Service Orchestrator (CSO).
Salt Synchronization Error
Problem
Description: While you are upgrading CSO to Release 3.3 or reverting to the previously-installed release, the upgrade or revert status is displayed as Going to sync salt... for a considerable time.
The Salt Master on the installer VM might be unable to reach all Salt Minions on the other VMs, and the salt timeout exception might occur.
Solution
Based on the output of the salt ‘*’ test.ping command, you must either restart the Salt Master or the Salt Minion.
To resolve the error:
- Open another instance of installer VM.
- Run the salt ‘*’ test.ping command,
to check if the Salt Master on the installer VM is able to reach other
VMs.
root@host:~/# salt ‘*’ test.pingIf the following error occurs, you must restart the Salt Master.
Salt request timed out. The master is not responding. If this error persists after verifying the master is up,worker_threads may need to be increased
root@host:~/# service salt-master restartIf there are no errors, view the output.
root@host:~/# salt ‘*’ test.pingcsp-regional-sblb.DB7RFF.regional: True csp-contrailanalytics-1.8V1O2D.central: True csp-central-msvm.8V1O2D.central: True csp-regional-k8mastervm.DB7RFF.regional: True csp-central-infravm.8V1O2D.central: False csp-regional-msvm.DB7RFF.regional: False csp-regional-infravm.DB7RFF.regional: True csp-central-k8mastervm.8V1O2D.central: TrueIf the status of a VM is False, you must login to the VM, and restart the Salt Minion.
root@host:~/csp-central-infravm.8V1O2D.central# service salt-minion restart
- Rerun the salt ‘*’ test.ping command to verify if the status for all VMs is True.
Cache Clearance Error
Problem
Description: While you are upgrading CSO to Release 3.3, the following error might occur:
Could not free cache on host server ServerName
Solution
You must clear the cache on the host server.
To resolve the error:
- Log in to the host server through SSH.
- To clear the cache, run the following command:
root@host:~/Contrail_Service_Orchestration_3.3# free && sync && echo 3 > /proc/sys/vm/drop_caches && freeThe following output is displayed:
total used free shared buffers cached Mem: 264036628 214945716 49090912 15092 198092 71878992 -/+ buffers/cache: 142868632 121167996 Swap: 390233084 473808 389759276 total used free shared buffers cached Mem: 264036628 142165996 121870632 15092 3256 75792 -/+ buffers/cache: 142086948 121949680 Swap: 390233084 473808 389759276
The cache is cleared on the host server.
Kube-system Pod Error
Problem
Description: While you are upgrading CSO to Release 3.3, the following error might occur:
One or more kube-system pods are not running
Solution
Check the status of the kube-system pod, and restart kube-proxy if required.
To resolve the error:
- Log in to the central or regional microservices VM through SSH.
- To view the status of the kube-system pod, run the following command:
root@host:~/# kubectl get pods –namespace=kube-systemThe following output is displayed:
NAME READY STATUS RESTARTS AGE etcd-empty-dir-cleanup-10.213.20.182 1/1 Running 2 3d kube-addon-manager-10.213.20.182 1/1 Running 2 3d kube-apiserver-10.213.20.182 1/1 Running 2 3d kube-controller-manager-10.213.20.182 1/1 Running 6 3d kube-dns-v11-4cmhl 4/4 Running 0 3h kube-proxy-10.213.20.181 0/1 Error 2 3d kube-scheduler-10.213.20.182 1/1 Running 6 3d
Check the status of kube-proxy. You must restart kube-proxy if the status is Error, Crashloopback, or MatchNodeSelector.
- To restart kube-proxy, run the following
command.
root@host:~/# Kubectl apply –f /etc/kubernetes/manifests/kube-proxy.yaml
The kube-system pod-related error is resolved.
Kubernetes Node Error
Problem
Description: While you are upgrading CSO to Release 3.3, the following error might occur:
One or more nodes down
Solution
Check the status of kube-master or kube-minion and restart the nodes, if required.
To resolve the issue:
- Log in to the central or regional microservices VM through SSH.
- Run the following command to check the status of each
node:
root@host:~/# kubectl get nodesNAME STATUS AGE VERSION 10.213.20.181 Not Ready 3d v1.6.0 10.213.20.182 Ready 3d v1.6.0
Identify the node that is in the Not Ready status. You must restart the node if the status is Not Ready.
- To restart the node that is in the Not Ready status, log in to
the node through SSH and run the following command:
root@host:~/# service kubelet restart - Rerun the following command to check the status of the
node that you restarted.
root@host:~/# kubectl get nodes
The kubernetes node-related error is resolved.
Snapshot Error
Problem
Description: The upgrade.sh script, sets CSO to maintenance mode, takes a snapshot of all VMs so that you can roll back to the previous release if the upgrade fails. While you are upgrading to CSO Release 3.3, the snapshot process might fail because of the following reasons:
Unable to shutdown one or more VMs—You must manually shutdown the VM.
Unable to take a snapshot for one ore more VMs—You must manually restart the VMs, start kubernetes pods, and set CSO to active mode.
Solution
To manually shutdown the VMs:
- Log in to the CSO node or server as root.
- Execute the following command to view the list of VMs.
root@host:~/# virsh list --allThe VMs are listed as follows:
Id Name State ---------------------------------------------------- 10 vrr1 running 11 vrr2 running 40 canvm shut off 41 centralinfravm shut off 43 centralk8mastervm running 44 centralmsvm shut off 45 installervm running 46 regional-sblb shut off 47 regionalinfravm running 48 regionalk8mastervm shut off 49 regionalmsvm shut off
Identify the VMs that are in running state.
- Execute the following command to shutdown the VMs that
are in running state:
root@host:~/# virsh shutdown VMName
If you want to proceed with the upgrade process, you can rerun the upgrade.sh script.
If you are unable to take the snapshot for one or more VMs, you must:
- Log in to the CSO node or server as root.
- Execute the following command to view the list of VMs.
root@host:~/# virsh list --allThe VMs are listed as follows:
Id Name State ---------------------------------------------------- 10 vrr1 running 11 vrr2 running 40 canvm running 41 centralinfravm running 43 centralk8mastervm running 44 centralmsvm running 45 installervm running 46 regional-sblb shut off 47 regionalinfravm running 48 regionalk8mastervm shut off 49 regionalmsvm running
Identify the VMs that are in shut off state.
- Execute the following command to restart the VMs that
are in shut off state.
root@host:~/# virsh start VMNameYou must restart the VMs in the following order:
Infrastructure VM
Load balancer VM
Southbound Load balancer VM
Contrail Analytics VM
K8 Master VM
Microservices VM
- On the installer VM, run the following commands to start
the kubernetes pod:
Execute the following command to check the status of clear_cache_pods.
root@host:~/# cat upgrade/upgrade.conf | grep clear_cache_podsIf the status of clear_cache_pods is successful, execute the following command to start the kubernetes pod.
root@host:~/# ./python.sh vm_snapshot/scale_pods.py revert
- Log in to
the central infrastructure VM through SSH , and run the following command to set CSO
to active mode.
root@host:~/# etcdctl set /lb/maintenance false
If you want to proceed with the upgrade process, you can rerun the upgrade.sh script.
