Replace a Control Plane Node

SUMMARY Learn how to identify and replace an unhealthy node in an OpenShift cluster.

Replacing a control plane node requires you to first identify and remove the unhealthy node. After you remove the unhealthy node, you can then add the new replacement node.

We provide these example procedures purely for informational purposes. See Red Hat OpenShift documentation (https://docs.openshift.com/) for the official procedure.

Remove an Unhealthy Control Plane Node

Use this procedure to identify and remove an unhealthy control plane node.

Check the status of the control plane nodes to identify the unhealthy member.

In this example, ocp3 is the unhealthy node.

Back up the etcd database on one of the healthy nodes by following the procedure in Back Up the Etcd Database.

List the etcd members.

Open a remote shell to an etcd pod on a healthy node (for example, ocp1).

List the etcd members.

Remove the etcd member on the unhealthy node.

View the member list again.

Type exit to exit the remote shell.

Back on the AI client node, remove the old secrets for the unhealthy etcd member.

List the secrets for the unhealthy (removed) member.

Delete the peer secrets.

Delete the metrics secrets.

Delete the serving secrets.

Finally, delete the unhealthy node.

Cordon the unhealthy node.

Drain the unhealthy node.

Delete the unhealthy node.

List the nodes.

You have now identified and removed the unhealthy node.

Add a Replacement Control Plane Node

Use this procedure to add a replacement control plane node to an existing OpenShift cluster. An OpenShift cluster has exactly 3 control plane nodes. You cannot use this procedure to add a node to a cluster that already has 3 control plane nodes.

This procedure shows an example of late binding. In late binding, you generate an ISO and boot the node with that ISO. After the node boots, you bind the node to the existing cluster.

This causes one or more CertificateSigningRequests (CSRs) to be sent from the new node to the existing cluster. A CSR is simply a request to obtain the client certificates for the (existing) cluster. You'll need to explicitly approve these requests. Once approved, the existing cluster provides the client certificates to the new node, and the new node is allowed to join the existing cluster.

Log in to the machine (VM or BMS) that you're using as the Assisted Installer client. The Assisted Installer client machine is where you issue Assisted Installer API calls to the Assisted Installer server hosted by Red Hat.

Prepare the deployment by setting the environment variables that you'll use in later steps.

Set up the same SSH key that you use for the existing cluster.
In this example, we retrieve that SSH key from its default location ~/.ssh/id_rsa.pub and store into a variable.
If you no longer have the image pull secret, then download the image pull secret from your Red Hat account onto your local computer. The pull secret allows your installation to access services and registries that serve container images for OpenShift components.
If you're using the Red Hat hosted Assisted Installer, you can download the pull secret file (pull-secret) from the https://console.redhat.com/openshift/downloads page. Copy the pull-secret file to the Assisted Installer client machine. In this example, we store the pull-secret in a file called pull-secret.txt.

Strip out any whitespace, convert the contents to JSON string format, and store to an environment variable, as follows:
If you no longer have your offline access token, then copy the offline access token from your Red Hat account. The OpenShift Cluster Manager API Token allows you (on the Assisted Installer client machine) to interact with the Assisted Installer API service hosted by Red Hat.
The token is a string that you can copy and paste to a local environment variable. If you're using the Red Hat hosted Assisted Installer, you can copy the API token from https://console.redhat.com/openshift/downloads.
Generate (refresh) the token from the OFFLINE_ACCESS_TOKEN. You will use this generated token whenever you issue API commands.
Note:
This token expires regularly. When this token expires, you will get an HTTP 4xx response whenever you issue an API command. Refresh the token when it expires, or alternatively, refresh the token regularly prior to expiry. There is no harm in refreshing the token when it hasn't expired.

Get the OpenShift cluster ID of the existing cluster.

For example:Save it to a variable:

Set up the remaining environment variables.

Table 1 lists all the environment variables that you need to set in this procedure, including the ones described in the previous steps.

Table 1: Environment Variables
Variable	Description	Example
CLUSTER_SSHKEY	The (public) SSH key you use for the existing cluster. You must use this same key for the new node you're adding.	–
PULL_SECRET	The image pull secret that you downloaded, stripped and converted to JSON string format.	–
OFFLINE_ACCESS_TOKEN	The OpenShift Cluster Manager API Token that you copied.	–
TOKEN	The token that you generated (refreshed) from the OFFLINE_ACCESS_TOKEN.	–
CLUSTER_NAME	The name of the existing cluster.	mycluster
CLUSTER_DOMAIN	The base domain of the existing cluster.	contrail.lan
OS_CLUSTER_ID	The OpenShift cluster ID of the existing cluster.	1777102a-1fe1-407a-9441-9d0bad4f5968
AI_URL	The URL of the Assisted Installer service. This example uses the Red Hat hosted Assisted Installer.	https://api.openshift.com

Generate the discovery boot ISO. You will use this ISO to boot the node that you're adding to the cluster.
The ISO is customized to your infrastructure based on the infrastructure environment that you'll set up.
1. Create a file that describes the infrastructure environment. In this example, we name it infra-envs-addhost.json.
  where:
  - InfraEnv Name is the name you want call the InfraEnv.
  - user_managed_networking and vip_dhcp_allocation are set to the same values as for the existing cluster.
2. Register the InfraEnv. In response, the Assisted Installer service assigns an InfraEnv ID and builds the discovery boot ISO based on the specified infrastructure environment.
  When you register the InfraEnv, the Assisted Installer service returns an InfraEnv ID. Look carefully for the InfraEnv ID embedded in the response. For example:
  Store the InfraEnv ID into a variable. For example:
3. Get the image download URL.
  The Assisted Installer service returns the image URL.
4. Download the ISO and save it to a file. In this example, we save it to ai-liveiso-addhosts.iso.

Boot the new node with the discovery boot ISO. Choose the boot method most convenient for your infrastructure. Ensure that the new node boots up attached to a network that has access to the Red Hat hosted Assisted Installer.

Check the status of the host:Store the host ID into a variable.

Configure the new node as a control plane node.

Check to see that the following is embedded in the response:

Import the existing cluster.

When you import the cluster, the Assisted Installer service returns a cluster ID for the AddHostsCluster. Look carefully for the cluster ID embedded in the response. For example:

Bind the new host to the cluster, referencing the cluster ID of the AddHostsCluster.

Check the status of the host regularly:Proceed to the next step when you see the following output:

Install the new node.

Check on the status of the node: Look for the following status, which indicates that the node has rebooted:

Once the new node has rebooted, it will try to join the existing cluster. This causes one or more CertificateSigningRequests (CSRs) to be sent from the new node to the existing cluster. You will need to approve the CSRs.

Check for the CSRs.

For example:

You may need to repeat this command periodically until you see pending CSRs.

Approve the CSRs.

For example:

Verify that the new node is up and running in the existing cluster.

ON THIS PAGE

Replace a Control Plane Node

Remove an Unhealthy Control Plane Node

Add a Replacement Control Plane Node