Rook-Ceph Failure
Problem
Sometimes when you are shutting down a node or a node fails, the pod status is stuck in
initializing
, Ceph status might hang, or the Rook module fails with the
following error:
Error: "Module 'rook' has failed: None: Max retries exceeded with url: /api/v1/nodes (Caused by None)"
Cause
The Ceph monitor isn't fully initialized or ready to serve requests and the metadata server cannot complete its initialization because it's dependent on the Ceph monitor.
Solution
Restart rook-ceph-operator
and the metadata server using the kubectl
rollout restart deployment -n rook-ceph rook-ceph-operator rook-ceph-mds-cephfs-a
rook-ceph-mds-cephfs-b
command.
root@pa1:~# kubectl rollout restart deployment -n rook-ceph rook-ceph-operator rook-ceph-mds-cephfs-a rook-ceph-mds-cephfs-b Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "rook-ceph-operator" must set securityContext.allowPrivilegeEscalation=false), seccompProfile (pod or container "rook-ceph-operator" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") deployment.apps/rook-ceph-operator restarted Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "chown-container-data-dir", "mds", "log-collector" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "chown-container-data-dir", "mds", "log-collector" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "ceph-daemons-sock-dir", "rook-ceph-log", "rook-ceph-crash" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "chown-container-data-dir", "mds", "log-collector" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "chown-container-data-dir", "mds", "log-collector" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") deployment.apps/rook-ceph-mds-cephfs-a restarted deployment.apps/rook-ceph-mds-cephfs-b restarted