Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Replacing a Routing Engine in an SRX Series High-End Chassis Cluster

 

You can replace a Routing Engine on a node in a chassis cluster by using one of the following methods:

Replacing a Routing Engine: USB Flash-Drive Method

The following are the prerequisites and assumptions for this procedure:

  • Console and SSH access are available.

  • Firmware package is available on the USB flash drive and the firmware version matches the version of Junos OS currently installed on the device. Use the show version command to identify the Junos OS version. You can download the firmware from https://support.juniper.net/support/downloads/

  • The chassis cluster has only two redundancy groups (RG0 and RG1) configured.

This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.

To replace a Routing Engine on node 0 of a chassis cluster using a USB flash drive:

  1. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      root@node0> request chassis cluster failover redundancy-group 1 node 1

      Fail over RG0:

      root@node0> request chassis cluster failover redundancy-group 0 node 1
    2. Verify that both RGs are active on node 1 after the failover:

      root@node0> show chassis cluster status
    3. Check whether any licenses are installed:

      root@node0> show system licenses
    4. If licenses are installed, copy the output of the show system license keys command into a file:

      root@node0> show system license keys
  2. Back up the Routing Engine configuration and scripts (if any) on node 0 to a USB flash drive:
    1. Access the UNIX-level shell on node 0:

      root@node0>start shell user root
    2. Before you mount the USB flash drive, list all the directories with names starting with da in the dev folder:

      root@node0% ls /dev/da*
    3. Insert the USB flash drive in the USB port.

      The following output is displayed:

    4. List all the directories with names starting with da in the dev folder, and identify the USB drive.

      root@node0% ls /dev/da*

      In this example, the USB flash drive is /dev/da2s1.

    5. Create a directory to mount the USB flash drive:

      root@node0% mkdir /var/tmp/usb
    6. Mount the USB flash drive to the /var/tmp/usb directory:

      root@node0% mount -t msdosfs /dev/da2s1 /var/tmp/usb
    7. Save the configuration on node 0 to the tmp folder:

      root@node0% cli show configuration | save /var/tmp/config[date]
    8. Copy the configuration file to the USB flash drive:

      root@node0% cp /var/tmp/config[date] /var/tmp/usb/config[date]
    9. Check whether any scripts are referenced in the configuration:

      root@node0> show configuration system scripts
      root@node0> show configuration event-options
    10. If any scripts are referenced in the configuration, back up these scripts:

      root@node0% cp -r /var/db/scripts/ /var/tmp/usb/scripts/
    11. Verify the files copied to the USB flash drive:

      root@node0% ls /var/tmp/usb
    12. Unmount the USB flash drive:

      root@node0% umount /var/tmp/usb
    13. Remove the USB flash drive.

    14. Exit the shell.

      root@node0% exit
  3. Install the replacement Routing Engine:
    1. Power off node 0:

      root@node0> request system power-off
    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

      root@node1> show chassis cluster interfaces
    7. Power on node 0.

  4. Load the configuration file, firmware, and scripts file on the new Routing Engine:
    1. Insert the USB flash drive into the USB port on node 0, and access the UNIX-level shell on node 0:

      root@node0> start shell user root
    2. Copy the configuration file, firmware, and scripts file from the USB:

      root@node0% cp /var/tmp/usb/config[date] /var/tmp/config[date]
      root@node0% cp /var/tmp/usb/junos version /var/tmp/junos version
      root@node0% cp -r /var/tmp/usb/scripts/ /var/db/scripts/
    3. Unmount the USB flash drive:

      root@node0% umount /var/tmp/usb
    4. Remove the USB flash drive.

    5. Exit the shell.

      root@node0% exit
  5. Configure the Routing Engine:
    1. Load the firmware:

      root@node0>request software add /var/tmp/junos-release-domestic.tgz reboot

      The device reboots and comes up with the intended Junos OS version.

    2. (Optional) Apply the licenses that you backed up in step 1:

      root@node0> request system license add terminal

      See Adding New Licenses (CLI Procedure)

    3. Load and commit the configuration:

      root@node0> configure shared
      root@node0# load override /var/tmp/filename
      root@node0# commit
      root@node0# exit
  6. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online.
    root@node0>show chassis fpc pic-status
  7. Halt node 0 from the console:
    root@node0>request system halt
  8. Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
  9. Boot up node 0 by pressing any key on the console.
  10. Check the chassis cluster status on node 1:
    root@node1> show chassis cluster status

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  11. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
    root@node1> show security monitoring
  12. If the cluster is healthy, reset the cluster priorities:
    root@node1> request chassis cluster failover reset redundancy-group 1
    root@node1> request chassis cluster failover reset redundancy-group 0

Replacing a Routing Engine: External SCP Server Method

The following are the assumptions and prerequisites for this procedure:

  • Console access and SSH access are available.

  • The chassis cluster has only two redundancy groups (RG0 and RG1) configured.

This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.

To replace a Routing Engine on node 0 of a chassis cluster using an external Secure Copy Protocol (SCP) server:

  1. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      root@node0> request chassis cluster failover redundancy-group 1 node 1

      Fail over RG0:

      root@node0> request chassis cluster failover redundancy-group 0 node 1
    2. Verify that both RGs are active on node 1 after the failover::

      root@node0> show chassis cluster status
    3. Check whether any licenses are installed:

      root@node0> show system licenses
    4. If licenses are installed, copy the output of the show system license keys command into a file:

      root@node0> show system license keys
  2. Back up the Routing Engine configuration:
    1. Save the configuration to the tmp folder:

      root@node0> edit
      root@node0# save /var/tmp/config[date]
    2. Access the UNIX-level shell on node 0:

      root@node0> start shell user root
    3. Copy the configuration file to an external server with SCP enabled:

      root@node0%  scp /var/tmp/node0-config-yyyy-mm-dd root@server-ip:/node0-config-yyyy-mm-dd

    4. Check whether any scripts are referenced in the configuration:

      root@node0> show configuration system scripts
      root@node0> show configuration event-options
    5. If any scripts are referenced in the configuration, back up these scripts:

      root@node0% scp /var/db/scripts/commit/commit-script.slax root@server-ip: /commit-script.slax
    6. Verify the saved configuration on the external SCP server.

    7. Exit the shell.

      root@node0% exit
  3. Install the replacement Routing Engine:
    1. Power off node 0:

      root@node0> request system power-off
    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

      root@node1> show chassis cluster interfaces
    7. Power on node 0.

  4. Load the configuration file and scripts on the new Routing Engine:
    1. Log in to the Routing Engine on node 0 from the console.

    2. Configure the IP address for the fxp0 interface, and add the necessary route to access the external server:

      root@node0> edit
      root@node0# set system services ssh
      root@node0# set interfaces fxp0 unit 0 family inet address ip-address mask
      root@node0# set system root-authentication plain-text-password

      The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.

      You can view the control port configuration from node 1:

      root@node1>show configuration chassis cluster control-ports | display set
    3. Commit the configuration:

      root@node0# commit
      Note

      Management and basic routing configuration are complete at this point. You can verify the reachability of the external server from the node by using the ping command.

    4. Exit configuration mode:

      root@node0# exit
      root@node0>
    5. Load the Junos OS image from the external server:

      root@node0> start shell user root
      root@node0% cd /var/tmp
      root@node0% scp root@server-ip: junos-release-domestic.tgz /var/tmp/
      root@node0% cli
      root@node0>request system software add /var/tmp/junos-release-domestic.tgz reboot

      The device reboots and comes up with the intended Junos OS version.

    6. Copy the configuration file from the external SCP server:

      root@node0% scp root@server-ip: /node0-config-yyyy-mm-dd /var/tmp/node0-config-yyyy-mm-dd
    7. (Optional) If you backed up scripts, then restore the scripts from the external SCP server:

      root@node0% scp root@server-ip: /commit-script.slax /var/db/scripts/commit/commit-script.slax
    8. (Optional) Apply the licenses that you backed up in step 1:

      root@node0>request system license add terminal

      See Adding New Licenses (CLI Procedure)

    9. Load the configuration:

      root@node0> configure shared
      root@node0# load override /var/tmp/node0-config-yyyy-mm-dd
      root@node0# commit
      root@node0# exit
  5. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online.
    root@node0>show chassis fpc pic-status
  6. Halt node 0 from the console:
    root@node0> request system halt
  7. Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
  8. Boot up node 0 by pressing any key on the console.
  9. Check the chassis cluster status on node 1:
    root@node1> show chassis cluster status

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  10. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
    root@node1> show security monitoring
  11. If the cluster is healthy, reset the cluster priorities:
    root@node1> request chassis cluster failover reset redundancy-group 1
    root@node1> request chassis cluster failover reset redundancy-group 0

Replacing the Routing Engine: File Transfer Method

To replace and configure a Routing Engine by transferring files from another node in a chassis cluster (node 0 is used as an example):

  1. Ensure that the firmware image is available on node 1 in the /var/tmp folder. You can download the firmware from https://support.juniper.net/support/downloads/.
  2. Save a local copy of the configuration in the /var/tmp folder on node 1:
    user@node1# show configuration | save /var/tmp/cfg-node1
  3. Prepare to shut down node 0:
    1. Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.

      Fail over RG1:

      root@node0> request chassis cluster failover redundancy-group 1 node 1

      Fail over RG0:

      root@node0> request chassis cluster failover redundancy-group 0 node 1
    2. Verify that both RGs are active on node 1 after the failover:

      root@node0> show chassis cluster status
    3. Check whether any licenses are installed:

      root@node0> show system licenses
    4. If licenses are installed, copy the output of the show system license keys command into a file:

      root@node0> show system license keys
    5. Check whether any scripts are referenced in the configuration:

      root@node0> show configuration system scripts
      root@node0> show configuration event-options
    6. If any scripts are referenced in the configuration, then back up these scripts:

      root@node0# scp /var/db/scripts/commit/commit-script.slax root@node1-fxp0-ip: /commit-script.slax
  4. Install the replacement Routing Engine:
    1. Power off node 0:

      root@node0> request system power-off
    2. Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.

    3. Label and disconnect all the cables connected to node 0.

    4. Replace the Routing Engine.

    5. To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.

    6. Ensure that the status of the control link and fabric link on node 1 is down:

      root@node1> show chassis cluster interfaces
    7. Power on node 0.

  5. Load the configuration file and scripts on the new Routing Engine:
    1. Log in to the Routing Engine on node 0 from the console.

    2. Configure the root password and the IP address for the fxp0 interface. Do not commit the configuration.

      Note

      You need not configure a gateway as the assumption is that the fxp0 interfaces on both nodes are in the same subnet.

      root@node0> edit
      root@node0# set system root-authentication plain-text-password
      root@node0# set interfaces fxp0 unit 0 family inet address IP-address

      The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.

      You can view the control port configuration from node 1:

      root@node1> show configuration chassis cluster control-ports | display set
    3. Commit the configuration:

      root@node0# commit
    4. Exit configuration mode:

      root@node0# exit
      root@node0>
    5. Copy the image and configuration from node 1 to node 0 using Secure Copy Protocol (SCP). Use the IP address configured for the node 0 fxp0 interface in Step 5.

      root@node1>scp /var/tmp/image-file root@node0-fxp0-ip:/var/tmp/
      root@node1>scp /var/tmp/cfg-node1 root@node0-fxp0-ip:/var/tmp/
    6. Update the Junos OS image on the Routing Engine to the required version:

      root@node1>request system software add /var/tmp/junos-release-domestic.tgz reboot

      The device reboots and comes up with the intended Junos OS version.

    7. (Optional) Copy the scripts that you backed up in Step 3 from node 1:

      root@node1>scp /var/db/scripts/op/op-script.slax root@node0-fxp0-ip: /var/db/scripts/op/
    8. (Optional) Apply the licenses that you backed up in Step 3:

      root@node0> request system license add terminal

      See Adding New Licenses (CLI Procedure).

    9. Load the configuration:

      root@node0> load override /var/tmp/cfg-node1
      root@node0> commit

      Verify that the configuration commits without any error.

  6. Check the status of all the FPCs and PICs, and ensure that all the FPCs and PICs are online:
    root@node0>show chassis fpc pic-status
  7. Halt node 0 from the console:
    root@node0>request system halt
  8. Wait until a message appears on the console confirming that the services have stopped, and then connect all the disconnected cables.
  9. Boot up node 0 by pressing any key on the console.
  10. Check the chassis cluster status on node 1:
    root@node1>show chassis cluster status

    Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.

  11. Verify that sessions are showing up on node 0, and that the number of sessions on node 0 is nearly equal to the number of sessions on the primary node, node 1:
    root@node1>show security monitoring
  12. If the cluster is healthy, reset the cluster priorities:
    root@node1>request chassis cluster failover reset redundancy-group 1
    root@node1>request chassis cluster failover reset redundancy-group 0