Repair or Replace Cluster Nodes

You can repair and replace faulty nodes in your Paragon Automation cluster using Paragon Shell. This topic describes how to repair and replace nodes in your cluster.

Repair Nodes

To repair a faulty node in your existing Paragon Automation cluster.

Log in to the Linux root shell of the faulty node.

If you are unable to log in to the faulty node, go to step 4.
Stop and kill all RKE2 services on the faulty node.
Clear the data on the disk partition used for Ceph.
Use /dev/sdb if you used the OVA bundle to deploy your cluster.
Log in to the Linux root shell of the node you deployed the cluster from. Note that you can repair the faulty node from any functional node in the cluster.
Delete the faulty node from the cluster.
Where faulty-node-hostname is the hostname of the node you want to repair.
Type cli to enter Paragon Shell.

If you are not logged in to the node from where you deployed the cluster, then log out of the current node, and log in to the installer node.
Repair the node from Paragon Shell.
Where ip-address is the IP address of the node that you want to repair.

Replace Faulty Nodes

You can replace a faulty node with a new node. To replace a node, you must prepare the new node, delete the faulty node and then add the new node to the cluster.

Log in to the Linux root shell one of the functional nodes of the cluster and delete the faulty node.
Where faulty-node-hostname is the hostname of the node you want to replace.

Prepare the new node.

Perform the following steps to prepare the new node before replacing a faulty node.

The node that you want to replace the faulty node with can have a new IP address or the same IP address as the faulty node, but you will still need to create and prepare the node VM.

Log in to the VMware ESXi 8.0 server where have installed Paragon Automation.
Create the new node VM.

Perform the following steps to create the VM.
1. Right-click the Host icon and select Create/Register VM.
  
  The New virtual machine wizard appears.
2. On the Select creation type page, select Deploy a virtual machine from an OVF or OVA file.
  
  Click Next.
3. On the Select OVF and VMDK files page, enter a name for the node VM.
  
  Click to upload or drag and drop the OVA file or the OVF file along with the .vmdk file.
  
  Review the list of files to be uploaded and click Next.
4. On the Select storage page, select the appropriate datastore that can accommodate 300-GB SSD for the node VM.
  
  Click Next. The extraction of files takes a few minutes.
5. On the Deployment options page:
  - Select the virtual network to which the node VM will be connected.
  - Select the Thick disk provisioning option.
  - Enable the VM to power on automatically.
  Click Next.
6. On the Ready to complete page, review the VM settings.
  
  Click Finish to create the node VM.
7. (Optional) Verify the progress of the VM creation in the Recent tasks section at the bottom of the page.
8. When all the VM has been created, verify that it has the correct specifications and is powered on.

Configure the new node VM

Perform the following steps to configure the node VM.

Connect to the node VM console. You are logged in as root automatically.
You are prompted to change your password immediately. Enter and re-enter the new password. You are automatically logged out of the VM.

Note:
We recommend that you enter the same password for as the VMs in your existing cluster.
When prompted, log in again as root user with the newly configured password.

Configure the following information when prompted.

Table 1: VM Configuration Wizard
Prompt	Action
Do you want to set up a Hostname? (y/n)	Enter y to configure a hostname.
Please specify the Hostname	Enter an identifying hostname for the VM. If you do not enter a hostname, a default hostname in the format `controller-<VM-IP address 4th octet> is assigned.`
Do you want to set up Static IP (preferred)? (y/n)	Enter y to configure the IP address for the VM. This IP address can be different or the same as the faulty node.
Please specify the IP address in CIDR notation	Enter the IP address in the CIDR notation. For example, 10.1.2.3/24. The IP address must be in the same subnet as the IP addresses of your existing Paragon Automation cluster.
Please specify the Gateway IP	Enter the gateway IP address.
Please specify the Primary DNS IP	Enter the primary DNS IP address.
Please specify the Secondary DNS IP	Enter the secondary DNS IP address.

When prompted if you are sure to proceed, review the information displayed, type y and press Enter.
When prompted to create a cluster, type n and press Enter.

You have prepared the node and can now replace the faulty node with the newly prepared node.

Replace the Faulty Node:

Log in to the node from which you deployed your existing Paragon Automation cluster. You are placed in Paragon Shell.
If the IP address of the new node is same as the IP address of the faulty node, go to step 5. If the IP address of the new node is different from the IP address of the faulty node, perform the following steps.
1. To edit the cluster, type configure to enter the configuration mode.
2. Delete the faulty node.
  Where 3 is the index number of the node that you want to delete.
3. Add the new node to the cluster configuration in place of the node you deleted and commit the configuration.
  Where 10.1.2.11 is the IP address of the new node.
4. (Optional) Verify the cluster configuration.
5. Exit configuration mode and regenerate the configuration files.

Regenerate SSH keys on the cluster nodes.

When prompted, enter the SSH password for all the existing VMs and the new VM. Enter the same password that you configured to log in to the VMs.

Replace the node.

Where 10.1.2.11 is the IP address of the new node.

Replace-node failure scenario

You might encounter issues with node-replacement when the IP address (or hostname) of the replacement node is different from the IP address (or hostname) of the faulty node; perform the following additional steps to fix the issue.

Log in to the Linux root shell of the node from where you deployed the cluster.
Delete the local volume pvc associated with the faulty node, if any.
1. Use the # kubectl describe pv -A command to determine if there is any lingering pvc associated with the faulty node and note the pvc name.
2. Use the # kubectl get pvc -A command to find the namespace associated with the pvc name.
3. Use the # kubectl delete pvc -n namespace pvc_name command to delete the pvc.
Run the following command to check if the status of Rook and Ceph pods installed in the rook-ceph namespace is not Running or Completed.

Remove the failing OSD processes.

Connect to the toolbox.

Identify the failing OSD.
Mark out the failed OSD.

Remove the failed OSD.

ON THIS PAGE

Repair or Replace Cluster Nodes

Repair Nodes

Replace Faulty Nodes

Replace-node failure scenario

See Also