ON THIS PAGE
Repair or Replace Cluster Nodes
You can repair and replace faulty nodes in your Paragon Automation cluster using Paragon Shell. This topic describes how to repair and replace nodes in your cluster.
Repair Nodes
To repair a faulty node in your existing Paragon Automation cluster.
Log in to the Linux root shell of the faulty node.
If you are unable to log in to the faulty node, go to step 4.
Stop and kill all RKE2 services on the faulty node.
root@node-f:~# rke2-killall.sh root@node-f:~# rke2-uninstall.sh
Clear the data on the disk partition used for Ceph.
root@node-f:~# wipefs -a -f /dev/partition root@node-f:~# dd if=/dev/zero of=/dev/partition bs=1M count=100
Use /dev/sdb if you used the OVA bundle to deploy your cluster.
Log in to the Linux root shell of the node you deployed the cluster from. Note that you can repair the faulty node from any functional node in the cluster.
Delete the faulty node from the cluster.
root@primary1:~# kubectl delete node faulty-node-hostname
Where faulty-node-hostname is the hostname of the node you want to repair.
Type
cli
to enter Paragon Shell.If you are not logged in to the node from where you deployed the cluster, then log out of the current node, and log in to the installer node.
Repair the node from Paragon Shell.
user@primary1> request paragon repair-node address ip-address-of-faulty-node
Where ip-address is the IP address of the node that you want to repair.
Replace Faulty Nodes
Log in to the Linux root shell one of the functional nodes of the cluster and delete the faulty node.
root@primary1:~# kubectl delete node faulty-node-hostname
Where faulty-node-hostname is the hostname of the node you want to replace.
Prepare the new node.
Perform the following steps to prepare the new node before replacing a faulty node.
The node that you want to replace the faulty node with can have a new IP address or the same IP address as the faulty node, but you will still need to create and prepare the node VM.
Log in to the VMware ESXi 8.0 server where have installed Paragon Automation.
Create the new node VM.
Perform the following steps to create the VM.
Right-click the Host icon and select Create/Register VM.
The New virtual machine wizard appears.
On the Select creation type page, select Deploy a virtual machine from an OVF or OVA file.
Click Next.
On the Select OVF and VMDK files page, enter a name for the node VM.
Click to upload or drag and drop the OVA file or the OVF file along with the .vmdk file.
Review the list of files to be uploaded and click Next.
On the Select storage page, select the appropriate datastore that can accommodate 300-GB SSD for the node VM.
Click Next. The extraction of files takes a few minutes.
On the Deployment options page:
Select the virtual network to which the node VM will be connected.
Select the Thick disk provisioning option.
Enable the VM to power on automatically.
Click Next.
On the Ready to complete page, review the VM settings.
Click Finish to create the node VM.
(Optional) Verify the progress of the VM creation in the Recent tasks section at the bottom of the page.
When all the VM has been created, verify that it has the correct specifications and is powered on.
Configure the new node VM
Perform the following steps to configure the node VM.
Connect to the node VM console. You are logged in as root automatically.
You are prompted to change your password immediately. Enter and re-enter the new password. You are automatically logged out of the VM.
Note:We recommend that you enter the same password for as the VMs in your existing cluster.
When prompted, log in again as root user with the newly configured password.
Configure the following information when prompted.
Table 1: VM Configuration Wizard Prompt
Action
Do you want to set up a Hostname? (y/n)
Enter y to configure a hostname.
Please specify the Hostname
Enter an identifying hostname for the VM.
If you do not enter a hostname, a default hostname in the format
controller-<VM-IP address 4th octet> is assigned.
Do you want to set up Static IP (preferred)? (y/n)
Enter y to configure the IP address for the VM. This IP address can be different or the same as the faulty node.
Please specify the IP address in CIDR notation
Enter the IP address in the CIDR notation. For example, 10.1.2.3/24.
The IP address must be in the same subnet as the IP addresses of your existing Paragon Automation cluster.
Please specify the Gateway IP
Enter the gateway IP address.
Please specify the Primary DNS IP
Enter the primary DNS IP address.
Please specify the Secondary DNS IP Enter the secondary DNS IP address.
When prompted if you are sure to proceed, review the information displayed, type y and press Enter.
When prompted to create a cluster, type n and press Enter.
You have prepared the node and can now replace the faulty node with the newly prepared node.
Replace the Faulty Node:
Log in to the node from which you deployed your existing Paragon Automation cluster. You are placed in Paragon Shell.
- If the IP address of the new node is same as the IP address of the faulty
node, go to step 5. If the
IP address of the new node is different from the IP address of the faulty
node, perform the following steps.
To edit the cluster, type
configure
to enter the configuration mode.user@primary1> configure Entering configuration mode [edit] user@primary1#
Delete the faulty node.
user@primary1# delete paragon cluster nodes kubernetes 3
Where 3 is the index number of the node that you want to delete.
Add the new node to the cluster configuration in place of the node you deleted and commit the configuration.
user@primary1# set paragon cluster nodes kubernetes 3 address 10.1.2.11 user@primary1# commit commit complete
Where 10.1.2.11 is the IP address of the new node.
(Optional) Verify the cluster configuration.
user@primary1# show paragon cluster nodes kubernetes 1 { address 10.1.2.3; } kubernetes 2 { address 10.1.2.4; } kubernetes 3 { address 10.1.2.11; } kubernetes 4 { address 10.1.2.6; }
Exit configuration mode and regenerate the configuration files.
user@primary1# exit Exiting configuration mode user@primary1> request paragon config Paragon inventory file saved at /epic/config/inventory Paragon config file saved at /epic/config/config
Regenerate SSH keys on the cluster nodes.
When prompted, enter the SSH password for all the existing VMs and the new VM. Enter the same password that you configured to log in to the VMs.
user@primary1> request paragon ssh-key Please enter comma-separated list of IP addresses: 10.1.2.3,10.1.2.4,10.1.2.6,10.1.2.11 Please enter SSH username for the node(s): root Please enter SSH password for the node(s): password checking server reachability and ssh connectivity ... Connectivity ok for 10.1.2.3 Connectivity ok for 10.1.2.4 Connectivity ok for 10.1.2.6 Connectivity ok for 10.1.2.11 <output snipped>
Type
configure
to enter configuration mode and replace the node.user@primary1> configure Entering configuration mode [edit] user@primary1# request paragon replace-node address 10.1.2.11 Process running with PID: 23xx032 To track progress, run 'monitor start /epic/config/log'
Where 10.1.2.11 is the IP address of the new node.
Replace-node failure scenario
You might encounter issues with node-replacement when the IP address (or hostname) of the replacement node is different from the IP address (or hostname) of the faulty node; perform the following additional steps to fix the issue.
Log in to the Linux root shell of the node from where you deployed the cluster.
Delete the local volume pvc associated with the faulty node, if any.
Use the
# kubectl describe pv -A
command to determine if there is any lingering pvc associated with the faulty node and note the pvc name.Use the
# kubectl get pvc -A
command to find the namespace associated with the pvc name.Use the
# kubectl delete pvc -n namespace pvc_name
command to delete the pvc.
Run the following command to check if the status of Rook and Ceph pods installed in the rook-ceph namespace is not
Running
orCompleted
.$ kubectl get po -n rook-ceph
Remove the failing OSD processes.
$ kubectl delete deploy -n rook-ceph rook-ceph-osd-number
Connect to the toolbox.
$ kubectl exec -ti -n rook-ceph $(kubectl get po -n rook-ceph -l app=rook-ceph-tools -o jsonpath={..metadata.name}) -- bash
Identify the failing OSD.
$ ceph osd status
Mark out the failed OSD.
$ ceph osd out osd-ID-number
Remove the failed OSD.
$ ceph osd purge osd-ID-number --yes-i-really-mean-it