Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Replacing a Routing Engine in an SRX Series High-End Chassis Cluster

You can replace a Routing Engine on a node in a chassis cluster by using one of the following methods:

Replacing a Routing Engine: USB Flash-Drive Method

The following are the prerequisites and assumptions for this procedure:

  • Console and SSH access are available.

  • Firmware package is available on the USB flash drive and the firmware version matches the version of Junos OS currently installed on the device. Use the show version command to identify the Junos OS version. You can download the firmware from https://support.juniper.net/support/downloads/

  • The chassis cluster has only two redundancy groups (RG0 and RG1) configured.

This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.

To replace a Routing Engine on node 0 of a chassis cluster using a USB flash drive:

  1. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      Fail over RG0:

    2. Verify that both RGs are active on node 1 after the failover:

    3. Check whether any licenses are installed:

    4. If licenses are installed, copy the output of the show system license keys command into a file:

  2. Back up the Routing Engine configuration and scripts (if any) on node 0 to a USB flash drive:
    1. Access the UNIX-level shell on node 0:

    2. Before you mount the USB flash drive, list all the directories with names starting with da in the dev folder:

    3. Insert the USB flash drive in the USB port.

      The following output is displayed:

    4. List all the directories with names starting with da in the dev folder, and identify the USB drive.

      In this example, the USB flash drive is /dev/da2s1.

    5. Create a directory to mount the USB flash drive:

    6. Mount the USB flash drive to the /var/tmp/usb directory:

    7. Save the configuration on node 0 to the tmp folder:

    8. Copy the configuration file to the USB flash drive:

    9. Check whether any scripts are referenced in the configuration:

    10. If any scripts are referenced in the configuration, back up these scripts:

    11. Verify the files copied to the USB flash drive:

    12. Unmount the USB flash drive:

    13. Remove the USB flash drive.

    14. Exit the shell.

  3. Install the replacement Routing Engine:
    1. Power off node 0:

    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

    7. Power on node 0.

  4. Load the configuration file, firmware, and scripts file on the new Routing Engine:
    1. Insert the USB flash drive into the USB port on node 0, and access the UNIX-level shell on node 0:

    2. Copy the configuration file, firmware, and scripts file from the USB:

    3. Unmount the USB flash drive:

    4. Remove the USB flash drive.

    5. Exit the shell.

  5. Configure the Routing Engine:
    1. Load the firmware:

      The device reboots and comes up with the intended Junos OS version.

    2. (Optional) Apply the licenses that you backed up in step 1:

      See Adding New Licenses (CLI Procedure)

    3. Load and commit the configuration:

  6. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online.
  7. Halt node 0 from the console:
  8. Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
  9. Boot up node 0 by pressing any key on the console.
  10. Check the chassis cluster status on node 1:

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  11. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
  12. If the cluster is healthy, reset the cluster priorities:

Replacing a Routing Engine: External SCP Server Method

The following are the assumptions and prerequisites for this procedure:

  • Console access and SSH access are available.

  • The chassis cluster has only two redundancy groups (RG0 and RG1) configured.

This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.

To replace a Routing Engine on node 0 of a chassis cluster using an external Secure Copy Protocol (SCP) server:

  1. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      Fail over RG0:

    2. Verify that both RGs are active on node 1 after the failover::

    3. Check whether any licenses are installed:

    4. If licenses are installed, copy the output of the show system license keys command into a file:

  2. Back up the Routing Engine configuration:
    1. Save the configuration to the tmp folder:

    2. Access the UNIX-level shell on node 0:

    3. Copy the configuration file to an external server with SCP enabled:

    4. Check whether any scripts are referenced in the configuration:

    5. If any scripts are referenced in the configuration, back up these scripts:

    6. Verify the saved configuration on the external SCP server.

    7. Exit the shell.

  3. Install the replacement Routing Engine:
    1. Power off node 0:

    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

    7. Power on node 0.

  4. Load the configuration file and scripts on the new Routing Engine:
    1. Log in to the Routing Engine on node 0 from the console.

    2. Configure the IP address for the fxp0 interface, and add the necessary route to access the external server:

      The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.

      You can view the control port configuration from node 1:

    3. Commit the configuration:

      Note:

      Management and basic routing configuration are complete at this point. You can verify the reachability of the external server from the node by using the ping command.

    4. Exit configuration mode:

    5. Load the Junos OS image from the external server:

      The device reboots and comes up with the intended Junos OS version.

    6. Copy the configuration file from the external SCP server:

    7. (Optional) If you backed up scripts, then restore the scripts from the external SCP server:

    8. (Optional) Apply the licenses that you backed up in step 1:

      See Adding New Licenses (CLI Procedure)

    9. Load the configuration:

  5. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online.
  6. Halt node 0 from the console:
  7. Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
  8. Boot up node 0 by pressing any key on the console.
  9. Check the chassis cluster status on node 1:

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  10. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
  11. If the cluster is healthy, reset the cluster priorities:

Replacing the Routing Engine: File Transfer Method

To replace and configure a Routing Engine by transferring files from another node in a chassis cluster (node 0 is used as an example):

  1. Ensure that the firmware image is available on node 1 in the /var/tmp folder. You can download the firmware from https://support.juniper.net/support/downloads/.
  2. Save a local copy of the configuration in the /var/tmp folder on node 1:
  3. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      Fail over RG0:

    2. Verify that both RGs are active on node 1 after the failover:

    3. Check whether any licenses are installed:

    4. If licenses are installed, copy the output of the show system license keys command into a file:

    5. Check whether any scripts are referenced in the configuration:

    6. If any scripts are referenced in the configuration, then back up these scripts:

  4. Install the replacement Routing Engine:
    1. Power off node 0:

    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

    7. Power on node 0.

  5. Load the configuration file and scripts on the new Routing Engine:
    1. Log in to the Routing Engine on node 0 from the console.

    2. Configure the root password and the IP address for the fxp0 interface. Do not commit the configuration.

      Note:

      You need not configure a gateway as the assumption is that the fxp0 interfaces on both nodes are in the same subnet.

      The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.

      You can view the control port configuration from node 1:

    3. Commit the configuration:

    4. Exit configuration mode:

    5. Copy the image and configuration from node 1 to node 0 using Secure Copy Protocol (SCP). Use the IP address configured for the node 0 fxp0 interface in Step 5.

    6. Update the Junos OS image on the Routing Engine to the required version:

      The device reboots and comes up with the intended Junos OS version.

    7. (Optional) Copy the scripts that you backed up in Step 3 from node 1:

    8. (Optional) Apply the licenses that you backed up in Step 3:

      See Adding New Licenses (CLI Procedure).

    9. Load the configuration:

      Verify that the configuration commits without any error.

  6. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online:
  7. Halt node 0 from the console:
  8. Wait until a message appears on the console confirming that the services have stopped, and then connect all the disconnected cables.
  9. Boot up node 0 by pressing any key on the console.
  10. Check the chassis cluster status on node 1:

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  11. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
  12. If the cluster is healthy, reset the cluster priorities: