Troubleshooting Upgrade-Related Errors
This topic describes the possible errors that you might encounter while you are upgrading Contrail Service Orchestrator (CSO).
It also suggests how to resolve those errors.
Salt Synchronization Error
Problem:
Description: The upgrade or revert status is displayed as Going to sync salt... for a considerable time while upgrading CSO to CSO Release 4.0.0 or reverting to the previously installed release.
The Salt Master on the installer VM might be unable to reach all Salt Minions on the other VMs and a salt timeout exception might occur.
Solution
Based on the output of the salt ‘*’ test.ping command, you must restart either the Salt Master or the Salt Minion.
To resolve the error:
- Open another instance of installer VM.
- Run the salt ‘*’ test.ping command,
to check whether the Salt Master on the installer VM is able to reach
other VMs.
root@host:~/# salt ‘*’ test.pingRestart the Salt Master if the following error occurs:
Salt request timed out. The master is not responding. If this error persists after verifying the master is up,worker_threads may need to be increased
root@host:~/# service salt-master restartIf there are no errors, review the output.
root@host:~/# salt ‘*’ test.pingcsp-regional-sblb.DB7RFF.regional: True csp-contrailanalytics-1.8V1O2D.central: True csp-central-msvm.8V1O2D.central: True csp-regional-k8mastervm.DB7RFF.regional: True csp-central-infravm.8V1O2D.central: False csp-regional-msvm.DB7RFF.regional: False csp-regional-infravm.DB7RFF.regional: True csp-central-k8mastervm.8V1O2D.central: TrueYou must log in to the VM and restart the Salt Minion whether the status of the VM is False.
root@host:~/csp-central-infravm.8V1O2D.central# service salt-minion restart
- Rerun the salt ‘*’ test.ping command to verify whether the status for all VMs is True.
Cache Clearance Error
Problem:
Description: The following error might occur while upgrading CSO to CSO Release 4.0.0:
Could not free cache on host server ServerName
Solution
You must clear the cache on the host server.
To resolve the error:
- Log in to the host server through SSH.
- To clear the cache, run the following command:
root@host:~/Contrail_Service_Orchestration_4.0.0# free && sync && echo 3 > /proc/sys/vm/drop_caches && freeThe following output is displayed:
total used free shared buffers cached Mem: 264036628 214945716 49090912 15092 198092 71878992 -/+ buffers/cache: 142868632 121167996 Swap: 390233084 473808 389759276 total used free shared buffers cached Mem: 264036628 142165996 121870632 15092 3256 75792 -/+ buffers/cache: 142086948 121949680 Swap: 390233084 473808 389759276
The cache is cleared on the host server.
Kube-system Pod Error
Problem:
Description: The following error might occur while upgrading CSO to CSO Release 4.0.0:
One or more kube-system pods are not running
Solution
Check the status of the kube-system pod, and restart kube-proxy if required.
To resolve the error:
- Log in to the central or regional microservices VM through SSH.
- Run the following command to view the status of the kube-system pod:
root@host:~/# kubectl get pods –namespace=kube-systemThe following output is displayed:
NAME READY STATUS RESTARTS AGE etcd-empty-dir-cleanup-10.213.20.182 1/1 Running 2 3d kube-addon-manager-10.213.20.182 1/1 Running 2 3d kube-apiserver-10.213.20.182 1/1 Running 2 3d kube-controller-manager-10.213.20.182 1/1 Running 6 3d kube-dns-v11-4cmhl 4/4 Running 0 3h kube-proxy-10.213.20.181 0/1 Error 2 3d kube-scheduler-10.213.20.182 1/1 Running 6 3d
Check the status of kube-proxy. You must restart kube-proxy if the status is Error, Crashloopback, or MatchNodeSelector.
- Run the following command to restart kube-proxy
root@host:~/# Kubectl apply –f /etc/kubernetes/manifests/kube-proxy.yaml
The kube-system pod-related error is resolved.
Kubernetes Node Error
Problem:
Description: The following error might occur while upgrading CSO to CSO Release 4.0.0:
One or more nodes down
Solution
Check the status of kube-master or kube-minion and restart the nodes, if required.
To resolve the issue:
- Log in to the central or regional microservices VM through SSH.
- Run the following command to check the status of each
node:
root@host:~/# kubectl get nodesNAME STATUS AGE VERSION 10.213.20.181 Not Ready 3d v1.6.0 10.213.20.182 Ready 3d v1.6.0
Identify the node that is in the Not Ready status. You must restart the node if the status is Not Ready.
- Restart the node if the status is Not Ready by logging in to the node through SSH and running the following
command:
root@host:~/# service kubelet restart - Rerun the following command to check the status of the
node that you restarted.
root@host:~/# kubectl get nodes
The Kubernetes node-related error is resolved.
