Appendix D: Remove a Ceph Storage Node

Use this procedure to remove a Ceph storage node from a Ceph cluster. Ceph storage node removal is handled as a Red Hat process rather than an end-to-end Contrail Cloud process. However, this procedure will demonstrate the removal of a storage node from an environment in the context of Contrail Cloud.

Before you begin, ensure that the remaining nodes in the cluster will be sufficient for keeping the required amount of pgs and replicas for your Ceph storage cluster. Ensure that both Ceph cluster and overcloud stack are healthy. For checking the health of your overcloud, see Verify Quorum and Node Health.

All examples in this procedure come from a lab setting to demonstrate storage removal within the context of Contrail Cloud. Sample output in the provided examples will differ from the information in your specific cloud deployment. In the examples used for this procedure, “storage3” will be the targeted node for removal.

Remove the storage node:

Find the connection between the bare metal server and the overcloud server. The output from the command below shows us that the serer we are looking for is “overcloud8st-cephstorageblue1-0”. This information will be used later in the procedure.

From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster is healthy:

Find the OSDs that reside on the server to be removed (overcloud8st-cephstorageblue1-0). We identify osd.2, osd.3, osd.6, and osd.7 from the example below:

While still logged in to the openstack controller mark osd.2, osd.3, osd.6, and osd.7 as non-operational:
From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

From the undercloud as the heat-admin user, SSH to Ceph node overcloud8st-cephstorageblue1-0, and stop the OSD services:

From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

From the undercloud as the heat-admin user, SSH back into the controller and remove further information about the OSDs from overcloud8st-cephstorageblue1-0:

From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

From the undercloud VM find the ID of the Ceph storage node:

Initiate a removal using the node ID from the previous step:
From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

Verify that the bare metal node is in a state of power off and available:

From the jump host as the contrail user mark the storage node with ‘status: deleting’ so the Ceph profile will be removed from it. Add the ‘status: deleting’ to the storage-nodes.yml file for storage3 and then run the script storage-nodes-assign.sh.
From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.
From the jump host as the contrail user, run openstack-deploy.sh to regenerate the templates to reflect the current state:
From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

If the goal is to remove the bare metal node completely, use the following additional procedure:

Edit the config/storage-nodes.yml file and remove the bare metal node.

Edit the inventory.yml file and include the ‘status: deleting’ to the node to be removed:

Run the inventory-assign.sh script:
From the undercloud as the heat-admin user, SSH to any of the openstack controllers and then run sudo ceph -s to verify that the Ceph cluster returns a “health_ok” state before you continue.

Verify the bare metal node has been removed. Enter the following command to view the list of nodes: