Use this procedure to remove a Ceph storage node from
a Ceph cluster. Ceph storage node removal is handled as a Red Hat
process rather than an end-to-end Contrail Cloud process. However,
this procedure will demonstrate the removal of a storage node from
an environment in the context of Contrail Cloud.
Before you begin, ensure that the remaining
nodes in the cluster will be sufficient for keeping the required amount
of pgs and replicas for your Ceph storage cluster. Ensure that both
Ceph cluster and overcloud stack are healthy. For checking the health
of your overcloud, see Verify Quorum and Node Health.
All examples in this procedure come from a lab setting
to demonstrate storage removal within the context of Contrail Cloud.
Sample output in the provided examples will differ from the information
in your specific cloud deployment. In the examples used for this procedure,
“storage3” will be the targeted node for removal.
Remove the storage node:
- Find the connection between the bare metal server and
the overcloud server. The output from the command below shows us that
the serer we are looking for is “overcloud8st-cephstorageblue1-0”.
This information will be used later in the procedure.
(undercloud) [stack@undercloud ~]$ openstack ccloud nodemap list
+---------------------------------+----------------+------------+----------------+
| Name | IP | Hypervisor | Hypervisor IP |
+---------------------------------+----------------+------------+----------------+
| overcloud8st-cc-2 | 192.168.213.54 | controler2 | 192.168.213.6 |
| overcloud8st-cc-1 | 192.168.213.51 | controler3 | 192.168.213.7 |
| overcloud8st-cc-0 | 192.168.213.55 | controler1 | 192.168.213.5 |
| overcloud8st-ca-1 | 192.168.213.72 | controler1 | 192.168.213.5 |
| overcloud8st-ca-0 | 192.168.213.60 | controler3 | 192.168.213.7 |
| overcloud8st-ca-2 | 192.168.213.53 | controler2 | 192.168.213.6 |
| overcloud8st-cadb-1 | 192.168.213.71 | controler1 | 192.168.213.5 |
| overcloud8st-afxctrl-0 | 192.168.213.69 | controler2 | 192.168.213.6 |
| overcloud8st-afxctrl-1 | 192.168.213.52 | controler3 | 192.168.213.7 |
| overcloud8st-afxctrl-2 | 192.168.213.58 | controler1 | 192.168.213.5 |
| overcloud8st-cadb-0 | 192.168.213.65 | controler2 | 192.168.213.6 |
| overcloud8st-ctrl-0 | 192.168.213.73 | controler2 | 192.168.213.6 |
| overcloud8st-ctrl-1 | 192.168.213.63 | controler1 | 192.168.213.5 |
| overcloud8st-ctrl-2 | 192.168.213.59 | controler3 | 192.168.213.7 |
| overcloud8st-cephstorageblue1-0 | 192.168.213.62 | storage3 | 192.168.213.62 |
| overcloud8st-compdpdk-0 | 192.168.213.56 | compute1 | 192.168.213.56 |
| overcloud8st-cephstorageblue2-0 | 192.168.213.61 | storage2 | 192.168.213.61 |
| overcloud8st-cephstorageblue2-1 | 192.168.213.80 | storage1 | 192.168.213.80 |
| overcloud8st-cadb-2 | 192.168.213.74 | controler3 | 192.168.213.7 |
+---------------------------------+----------------+------------+----------------+
- From the undercloud as the
heat-admin
user,
SSH to any of the openstack controllers and then run sudo ceph -s
to verify that the Ceph cluster is healthy:[root@overcloud8st-ctrl-1 ~]# sudo ceph -s
cluster:
id: a98b1580-bb97-11ea-9f2b-525400882160
health: HEALTH_OK
- Find the OSDs that reside on the server to be removed
(overcloud8st-cephstorageblue1-0). We identify osd.2, osd.3, osd.6,
and osd.7 from the example below:
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 10.91638 root default
-3 3.63879 host overcloud8st-cephstorageblue1-0
2 hdd 0.90970 osd.2 up 1.00000 1.00000
3 hdd 0.90970 osd.3 up 1.00000 1.00000
6 hdd 0.90970 osd.6 up 1.00000 1.00000
7 hdd 0.90970 osd.7 up 1.00000 1.00000
-7 3.63879 host overcloud8st-cephstorageblue2-0
1 hdd 0.90970 osd.1 up 1.00000 1.00000
4 hdd 0.90970 osd.4 up 1.00000 1.00000
8 hdd 0.90970 osd.8 up 1.00000 1.00000
10 hdd 0.90970 osd.10 up 1.00000 1.00000
-5 3.63879 host overcloud8st-cephstorageblue2-1
0 hdd 0.90970 osd.0 up 1.00000 1.00000
5 hdd 0.90970 osd.5 up 1.00000 1.00000
9 hdd 0.90970 osd.9 up 1.00000 1.00000
11 hdd 0.90970 osd.11 up 1.00000 1.00000
- While still logged in to the openstack controller mark
osd.2, osd.3, osd.6, and osd.7 as non-operational:
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 2
marked out osd.2.
[root@overcloud8st-ctrl-1 ~]# sudo osd out 3
marked out osd.3.
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 6
marked out osd.6.
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd out 7
marked out osd.7.
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
- From the undercloud as the
heat-admin
user,
SSH to Ceph node overcloud8st-cephstorageblue1-0, and stop the OSD
services:[root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop ceph-osd@2.service
[root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop ceph-osd@3.service
[root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop ceph-osd@6.service
[root@overcloud8st-cephstorageblue1-0 ~]# sudo systemctl stop ceph-osd@7.service
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
- From the undercloud as the
heat-admin
user,
SSH back into the controller and remove further information about
the OSDs from overcloud8st-cephstorageblue1-0:[root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.2
removed item id 2 name 'osd.2' from crush map
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.3
removed item id 3 name 'osd.3' from crush map
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.6
removed item id 6 name 'osd.6' from crush map
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush remove osd.7
removed item id 7 name 'osd.7' from crush map
[root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.2
updated
[root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.3
updated
[root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.6
updated
[root@overcloud8st-ctrl-1 ~]# sudo ceph auth del osd.7
updated
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 2
removed osd.2
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 3
removed osd.3
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 6
removed osd.6
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd rm 7
removed osd.7
[root@overcloud8st-ctrl-1 ~]# sudo ceph osd crush rm overcloud8st-cephstorageblue1-0
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
- From the undercloud VM find the ID of the Ceph storage
node:
(undercloud) [stack@undercloud ~]$ openstack server list | grep overcloud8st-cephstorageblue1-0
| 7ee9be4f-efda-4837-a597-a6554027d0c9 | overcloud8st-cephstorageblue1-0 | ACTIVE | ctlplane=192.168.213.62 | overcloud-full | CephStorageBlue1
- Initiate a removal using the node ID from the previous
step:
(undercloud) [stack@undercloud ~]$ openstack overcloud node delete --stack overcloud 7ee9be4f-efda-4837-a597-a6554027d0c9
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
- Verify that the bare metal node is in a state of
power off
and available
:(undercloud) [stack@undercloud ~]$ openstack baremetal node list | grep storage3
| 05bbab4b-b968-4d1d-87bc-a26ac335303d | storage3 | None | power off | available | False |
- From the jump host as the
contrail
user mark
the storage node with ‘status: deleting’ so the Ceph profile
will be removed from it. Add the ‘status: deleting’ to
the storage-nodes.yml file for storage3
and then run the script storage-nodes-assign.sh.[contrail@5a6s13-node1 contrail_cloud]$ cat config/storage-nodes.yml
storage_nodes:
- name: storage1
profile: blue2
- name: storage2
profile: blue2
- name: storage3
profile: blue1
status: deleting
[contrail@5a6s13-node1 contrail_cloud]$ ./scripts/storage-nodes-assign.sh
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
- From the jump host as the
contrail
user, run openstack-deploy.sh to regenerate the templates
to reflect the current state:[contrail@5a6s13-node1 contrail_cloud]$ ./scripts/openstack-deploy.sh
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
If the goal is to remove the bare metal node completely,
use the following additional procedure:
Edit the config/storage-nodes.yml file and remove the bare metal node.
Edit the inventory.yml file and include the ‘status: deleting’ to the node
to be removed:
[contrail@5a6s13-node1 contrail_cloud]$ cat config/inventory.yml
...
inventory_nodes:
- name: "storage3"
pm_addr: "10.84.129.184"
status: deleting
<<: *common
Run the inventory-assign.sh
script:
[contrail@5a6s13-node1 contrail_cloud]$ ./scripts/inventory-assign.sh
From the undercloud as the heat-admin
user, SSH to
any of the openstack controllers and then run sudo ceph
-s
to verify that the Ceph cluster returns a “health_ok”
state before you continue.
Verify the bare metal node has been removed. Enter the
following command to view the list of nodes:
(undercloud) [stack@undercloud ~]$ openstack ccloud nodemap list |grep storage
| overcloud8st-cephstorageblue2-1 | 192.168.213.80 | storage1 | 192.168.213.80 |
| overcloud8st-cephstorageblue2-0 | 192.168.213.61 | storage2 | 192.168.213.61 |