Replacing a Routing Engine in an SRX Series High-End Chassis Cluster
You can replace a Routing Engine on a node in a chassis cluster by using one of the following methods:
Replacing a Routing Engine: USB Flash-Drive Method
The following are the prerequisites and assumptions for this procedure:
Console and SSH access are available.
Firmware package is available on the USB flash drive and the firmware version matches the version of Junos OS currently installed on the device. Use the
show version
command to identify the Junos OS version. You can download the firmware from https://support.juniper.net/support/downloads/The chassis cluster has only two redundancy groups (RG0 and RG1) configured.
This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.
To replace a Routing Engine on node 0 of a chassis cluster using a USB flash drive:
- Prepare
to shut down node 0:
Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.
Fail over RG1:
root@node0> request chassis cluster failover redundancy-group 1 node 1
Fail over RG0:
root@node0> request chassis cluster failover redundancy-group 0 node 1
Verify that both RGs are active on node 1 after the failover:
root@node0> show chassis cluster status {secondary:node0} root@node0> show chassis cluster status Monitor Failure codes: CS Cold Sync monitoring FL Fabric Connection monitoring GR GRES monitoring HW Hardware monitoring IF Interface monitoring IP IP monitoring LB Loopback monitoring MB Mbuf monitoring NH Nexthop monitoring NP NPC monitoring SP SPU monitoring SM Schedule monitoring CF Config Sync monitoring RE Relinquish monitoring Cluster ID: 1 Node Priority Status Preempt Manual Monitor-failures Redundancy group: 0 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None Redundancy group: 1 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None
Check whether any licenses are installed:
root@node0> show system licenses {secondary:node0} root@node0> show system licenses License usage: Licenses Licenses Licenses Expiry Feature name used installed needed subscriber-acct 0 1 0 permanent subscriber-auth 0 1 0 permanent subscriber-addr 0 1 0 permanent subscriber-vlan 0 1 0 permanent subscriber-ip 0 1 0 permanent scale-subscriber 0 1000 0 permanent scale-l2tp 0 1000 0 permanent scale-mobile-ip 0 1000 0 permanent Licenses installed: License identifier: xxxxxxxxxx License version: 2 Features: subscriber-acct - Per Subscriber Radius Accounting permanent subscriber-auth - Per Subscriber Radius Authentication permanent subscriber-addr - Address Pool Assignment permanent subscriber-vlan - Dynamic Auto-sensed Vlan permanent subscriber-ip - Dynamic and Static IP permanent
If licenses are installed, copy the output of the
show system license keys
command into a file:root@node0> show system license keys
- Back up the Routing Engine configuration and scripts (if
any) on node 0 to a USB flash drive:
Access the UNIX-level shell on node 0:
root@node0>start shell user root {secondary:node0} root@node0> start shell user root Password: root@node0%
Before you mount the USB flash drive, list all the directories with names starting with da in the dev folder:
root@node0% ls /dev/da* root@node0% ls /dev/da* /dev/da0 /dev/da0s1c /dev/da0s2a /dev/da0s3 /dev/da0s3e /dev/da0s1 /dev/da0s1f /dev/da0s2c /dev/da0s3c /dev/da0s1a /dev/da0s2 /dev/da0s2f /dev/da0s3d
Insert the USB flash drive in the USB port.
The following output is displayed:
root@node0% umass1: TOSHIBA TransMemory, rev 2.00/1.00, addr 3 da2 at umass-sim1 bus 1 target 0 lun 0 da2: <TOSHIBA TransMemory 5.00> Removable Direct Access SCSI-0 device da2: 40.000MB/s transfers da2: 983MB (2013184 512 byte sectors: 64H 32S/T 983C)
List all the directories with names starting with da in the dev folder, and identify the USB drive.
root@node0% ls /dev/da* root@node0% ls /dev/da* /dev/da0 /dev/da0s1c /dev/da0s2a /dev/da0s3 /dev/da0s3e /dev/da0s1 /dev/da0s1f /dev/da0s2c /dev/da0s3c /dev/da2 /dev/da0s1a /dev/da0s2 /dev/da0s2f /dev/da0s3d /dev/da2s1
In this example, the USB flash drive is /dev/da2s1.
Create a directory to mount the USB flash drive:
root@node0% mkdir /var/tmp/usb
Mount the USB flash drive to the /var/tmp/usb directory:
root@node0% mount -t msdosfs /dev/da2s1 /var/tmp/usb
Save the configuration on node 0 to the tmp folder:
root@node0% cli show configuration | save /var/tmp/config[date]
Copy the configuration file to the USB flash drive:
root@node0% cp /var/tmp/config[date] /var/tmp/usb/config[date]
Check whether any scripts are referenced in the configuration:
root@node0> show configuration system scripts {secondary:node0} root@node0> show configuration system scripts set system scripts commit file interface-monitoring-check.slax set system scripts op file srx-monitor.xsl
root@node0> show configuration event-options {secondary:node0} root@node0> show configuration event-options set event-options generate-event 60s time-interval 60 set event-options policy NAT-POOL-UTIL events 60s set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments utilization-threshold 90 set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments pool GLOBAL set event-options event-script traceoptions file escript.log size 1m files 2 set event-options event-script traceoptions flag output set event-options event-script file srx-monitor-addbook-policy-count.slax
If any scripts are referenced in the configuration, back up these scripts:
root@node0% cp -r /var/db/scripts/ /var/tmp/usb/scripts/
Verify the files copied to the USB flash drive:
root@node0% ls /var/tmp/usb
Unmount the USB flash drive:
root@node0% umount /var/tmp/usb
Remove the USB flash drive.
Exit the shell.
root@node0% exit
- Install the replacement Routing Engine:
Power off node 0:
root@node0> request system power-off
Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.
Label and disconnect all the cables connected to node 0.
Replace the Routing Engine.
To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.
Ensure that the status of the control link and fabric link on node 1 is down:
root@node1> show chassis cluster interfaces {primary:node1} root@node1> show chassis cluster interfaces Control link status: Down Control interfaces: Index Interface Monitored-Status Internal-SA Security 0 em0 Down Disabled Disabled 1 em1 Down Disabled Disabled Fabric link status: Down Control interfaces: Name Child-Interface Status Security (Physical/Monitored) fab0 xe-11/0/3 Down / Down Disable fab0
Power on node 0.
- Load the configuration file, firmware, and scripts file
on the new Routing Engine:
Insert the USB flash drive into the USB port on node 0, and access the UNIX-level shell on node 0:
root@node0> start shell user root
Copy the configuration file, firmware, and scripts file from the USB:
root@node0% cp /var/tmp/usb/config[date] /var/tmp/config[date]
root@node0% cp /var/tmp/usb/junos version /var/tmp/junos version
root@node0% cp -r /var/tmp/usb/scripts/ /var/db/scripts/
Unmount the USB flash drive:
root@node0% umount /var/tmp/usb
Remove the USB flash drive.
Exit the shell.
root@node0% exit
- Configure the Routing Engine:
Load the firmware:
root@node0>request software add /var/tmp/junos-release-domestic.tgz reboot
The device reboots and comes up with the intended Junos OS version.
(Optional) Apply the licenses that you backed up in step 1:
root@node0> request system license add terminal
Load and commit the configuration:
root@node0> configure shared
root@node0# load override /var/tmp/filename
root@node0# commit
root@node0# exit
- Check the status of all the FPCs and PICs, and ensure
that all the FPCs and PICs are online.
root@node0>show chassis fpc pic-status
- Halt node 0 from the console:
root@node0>request system halt
- Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
- Boot up node 0 by pressing any key on the console.
- Check the chassis cluster status on node 1:
root@node1> show chassis cluster status
Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.
- Verify that sessions are showing up on node 0, and that
the number of sessions on node 0 is nearly equal to the number of
sessions on the primary node, node 1:
root@node1> show security monitoring
- If the cluster is healthy, reset the cluster priorities:
root@node1> request chassis cluster failover reset redundancy-group 1
root@node1> request chassis cluster failover reset redundancy-group 0
Replacing a Routing Engine: External SCP Server Method
The following are the assumptions and prerequisites for this procedure:
Console access and SSH access are available.
The chassis cluster has only two redundancy groups (RG0 and RG1) configured.
This procedure includes the steps for replacing the Routing Engine on node 0 of a chassis cluster setup. You can follow the same steps to replace the Routing Engine on node 1.
To replace a Routing Engine on node 0 of a chassis cluster using an external Secure Copy Protocol (SCP) server:
- Prepare
to shut down node 0:
Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.
Fail over RG1:
root@node0> request chassis cluster failover redundancy-group 1 node 1
Fail over RG0:
root@node0> request chassis cluster failover redundancy-group 0 node 1
Verify that both RGs are active on node 1 after the failover::
root@node0> show chassis cluster status {secondary:node0} root@node0> show chassis cluster status Monitor Failure codes: CS Cold Sync monitoring FL Fabric Connection monitoring GR GRES monitoring HW Hardware monitoring IF Interface monitoring IP IP monitoring LB Loopback monitoring MB Mbuf monitoring NH Nexthop monitoring NP NPC monitoring SP SPU monitoring SM Schedule monitoring CF Config Sync monitoring RE Relinquish monitoring Cluster ID: 1 Node Priority Status Preempt Manual Monitor-failures Redundancy group: 0 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None Redundancy group: 1 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None
Check whether any licenses are installed:
root@node0> show system licenses {secondary:node0} root@node0> show system licenses License usage: Licenses Licenses Licenses Expiry Feature name used installed needed subscriber-acct 0 1 0 permanent subscriber-auth 0 1 0 permanent subscriber-addr 0 1 0 permanent subscriber-vlan 0 1 0 permanent subscriber-ip 0 1 0 permanent scale-subscriber 0 1000 0 permanent scale-l2tp 0 1000 0 permanent scale-mobile-ip 0 1000 0 permanent Licenses installed: License identifier: xxxxxxxxxx License version: 2 Features: subscriber-acct - Per Subscriber Radius Accounting permanent subscriber-auth - Per Subscriber Radius Authentication permanent subscriber-addr - Address Pool Assignment permanent subscriber-vlan - Dynamic Auto-sensed Vlan permanent subscriber-ip - Dynamic and Static IP permanent
If licenses are installed, copy the output of the
show system license keys
command into a file:root@node0> show system license keys
- Back up the Routing Engine configuration:
Save the configuration to the tmp folder:
root@node0> edit
root@node0# save /var/tmp/config[date]
Access the UNIX-level shell on node 0:
root@node0> start shell user root
Copy the configuration file to an external server with SCP enabled:
root@node0% scp /var/tmp/node0-config-yyyy-mm-dd root@server-ip:/node0-config-yyyy-mm-dd
Check whether any scripts are referenced in the configuration:
root@node0> show configuration system scripts {secondary:node0} root@node0> show configuration system scripts set system scripts commit file interface-monitoring-check.slax set system scripts op file srx-monitor.xsl
root@node0> show configuration event-options {secondary:node0} root@node0> show configuration event-options set event-options generate-event 60s time-interval 60 set event-options policy NAT-POOL-UTIL events 60s set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments utilization-threshold 90 set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments pool GLOBAL set event-options event-script traceoptions file escript.log size 1m files 2 set event-options event-script traceoptions flag output set event-options event-script file srx-monitor-addbook-policy-count.slax
If any scripts are referenced in the configuration, back up these scripts:
root@node0% scp /var/db/scripts/commit/commit-script.slax root@server-ip: /commit-script.slax
Verify the saved configuration on the external SCP server.
Exit the shell.
root@node0% exit
- Install the replacement Routing Engine:
Power off node 0:
root@node0> request system power-off
Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.
Label and disconnect all the cables connected to node 0.
Replace the Routing Engine.
To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.
Ensure that the status of the control link and fabric link on node 1 is down:
root@node1> show chassis cluster interfaces {primary:node1} root@node1> show chassis cluster interfaces Control link status: Down Control interfaces: Index Interface Monitored-Status Internal-SA Security 0 em0 Down Disabled Disabled 1 em1 Down Disabled Disabled Fabric link status: Down Control interfaces: Name Child-Interface Status Security (Physical/Monitored) fab0 xe-11/0/3 Down / Down Disable fab0
Power on node 0.
- Load the configuration file and scripts on the new Routing
Engine:
Log in to the Routing Engine on node 0 from the console.
Configure the IP address for the fxp0 interface, and add the necessary route to access the external server:
root@node0> edit
root@node0# set system services ssh
root@node0# set interfaces fxp0 unit 0 family inet address ip-address mask
root@node0# set system root-authentication plain-text-password
The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.
You can view the control port configuration from node 1:
root@node1>show configuration chassis cluster control-ports | display set
Commit the configuration:
root@node0# commit
Note:Management and basic routing configuration are complete at this point. You can verify the reachability of the external server from the node by using the
ping
command.Exit configuration mode:
root@node0# exit
root@node0>
Load the Junos OS image from the external server:
root@node0> start shell user root
root@node0% cd /var/tmp
root@node0% scp root@server-ip: junos-release-domestic.tgz /var/tmp/
root@node0% cli
root@node0>request system software add /var/tmp/junos-release-domestic.tgz reboot
The device reboots and comes up with the intended Junos OS version.
Copy the configuration file from the external SCP server:
root@node0% scp root@server-ip: /node0-config-yyyy-mm-dd /var/tmp/node0-config-yyyy-mm-dd
(Optional) If you backed up scripts, then restore the scripts from the external SCP server:
root@node0% scp root@server-ip: /commit-script.slax /var/db/scripts/commit/commit-script.slax
(Optional) Apply the licenses that you backed up in step 1:
root@node0>request system license add terminal
Load the configuration:
root@node0> configure shared
root@node0# load override /var/tmp/node0-config-yyyy-mm-dd
root@node0# commit
root@node0# exit
- Check the status of all the FPCs and PICs, and ensure
that all the FPCs and PICs are online.
root@node0>show chassis fpc pic-status
- Halt node 0 from the console:
root@node0> request system halt
- Wait until a message appears on the console confirming that the services have stopped, and then connect all the cables to node 0.
- Boot up node 0 by pressing any key on the console.
- Check the chassis cluster status on node 1:
root@node1> show chassis cluster status
Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.
- Verify that sessions are showing up on node 0, and that
the number of sessions on node 0 is nearly equal to the number of
sessions on the primary node, node 1:
root@node1> show security monitoring
- If the cluster is healthy, reset the cluster priorities:
root@node1> request chassis cluster failover reset redundancy-group 1
root@node1> request chassis cluster failover reset redundancy-group 0
Replacing the Routing Engine: File Transfer Method
To replace and configure a Routing Engine by transferring files from another node in a chassis cluster (node 0 is used as an example):
- Ensure that the firmware image is available on node 1 in the /var/tmp folder. You can download the firmware from https://support.juniper.net/support/downloads/.
- Save a local copy of the configuration in the /var/tmp folder on node 1:
user@node1# show configuration | save /var/tmp/cfg-node1
- Prepare
to shut down node 0:
Perform a manual failover of the redundancy groups (RGs) from node 0 to node 1.
Fail over RG1:
root@node0> request chassis cluster failover redundancy-group 1 node 1
Fail over RG0:
root@node0> request chassis cluster failover redundancy-group 0 node 1
Verify that both RGs are active on node 1 after the failover:
root@node0> show chassis cluster status {secondary:node0} root@node0> show chassis cluster status Monitor Failure codes: CS Cold Sync monitoring FL Fabric Connection monitoring GR GRES monitoring HW Hardware monitoring IF Interface monitoring IP IP monitoring LB Loopback monitoring MB Mbuf monitoring NH Nexthop monitoring NP NPC monitoring SP SPU monitoring SM Schedule monitoring CF Config Sync monitoring RE Relinquish monitoring Cluster ID: 1 Node Priority Status Preempt Manual Monitor-failures Redundancy group: 0 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None Redundancy group: 1 , Failover count: 1 node0 129 secondary no no None node1 255 primary no no None
Check whether any licenses are installed:
root@node0> show system licenses {secondary:node0} root@node0> show system licenses License usage: Licenses Licenses Licenses Expiry Feature name used installed needed subscriber-acct 0 1 0 permanent subscriber-auth 0 1 0 permanent subscriber-addr 0 1 0 permanent subscriber-vlan 0 1 0 permanent subscriber-ip 0 1 0 permanent scale-subscriber 0 1000 0 permanent scale-l2tp 0 1000 0 permanent scale-mobile-ip 0 1000 0 permanent Licenses installed: License identifier: xxxxxxxxxx License version: 2 Features: subscriber-acct - Per Subscriber Radius Accounting permanent subscriber-auth - Per Subscriber Radius Authentication permanent subscriber-addr - Address Pool Assignment permanent subscriber-vlan - Dynamic Auto-sensed Vlan permanent subscriber-ip - Dynamic and Static IP permanent
If licenses are installed, copy the output of the
show system license keys
command into a file:root@node0> show system license keys
Check whether any scripts are referenced in the configuration:
root@node0> show configuration system scripts {secondary:node0} root@node0> show configuration system scripts set system scripts commit file interface-monitoring-check.slax set system scripts op file srx-monitor.xsl
root@node0> show configuration event-options {secondary:node0} root@node0> show configuration event-options set event-options generate-event 60s time-interval 60 set event-options policy NAT-POOL-UTIL events 60s set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments utilization-threshold 90 set event-options policy NAT-POOL-UTIL then event-script srx-nat-bucket-overload.slax arguments pool GLOBAL set event-options event-script traceoptions file escript.log size 1m files 2 set event-options event-script traceoptions flag output set event-options event-script file srx-monitor-addbook-policy-count.slax
If any scripts are referenced in the configuration, then back up these scripts:
root@node0# scp /var/db/scripts/commit/commit-script.slax root@node1-fxp0-ip: /commit-script.slax
- Install the replacement Routing Engine:
Power off node 0:
root@node0> request system power-off
Wait until a message appears on the console confirming that the services have stopped, and then physically turn off the power.
Label and disconnect all the cables connected to node 0.
Replace the Routing Engine.
To prevent a split-brain scenario (where the control link is connected while both the nodes are in the primary state), reconnect only the console cable and the cable to the fxp0 interface. Leave the rest of the cables disconnected.
Ensure that the status of the control link and fabric link on node 1 is down:
root@node1> show chassis cluster interfaces {primary:node1} root@node1> show chassis cluster interfaces Control link status: Down Control interfaces: Index Interface Monitored-Status Internal-SA Security 0 em0 Down Disabled Disabled 1 em1 Down Disabled Disabled Fabric link status: Down Control interfaces: Name Child-Interface Status Security (Physical/Monitored) fab0 xe-11/0/3 Down / Down Disable fab0
Power on node 0.
- Load
the configuration file and scripts on the new Routing Engine:
Log in to the Routing Engine on node 0 from the console.
Configure the root password and the IP address for the fxp0 interface. Do not commit the configuration.
Note:You need not configure a gateway as the assumption is that the fxp0 interfaces on both nodes are in the same subnet.
root@node0> edit
root@node0# set system root-authentication plain-text-password
New password: type password here Retype new password: retype password hereroot@node0# set interfaces fxp0 unit 0 family inet address IP-address
The chassis cluster information is stored in the Switch Control Board (SCB). The device comes up with the cluster enabled and does not allow a commit without the cluster port configuration. Apply the node 1 port configuration on node 0.
You can view the control port configuration from node 1:
root@node1> show configuration chassis cluster control-ports | display set
Commit the configuration:
root@node0# commit
Exit configuration mode:
root@node0# exit
root@node0>
Copy the image and configuration from node 1 to node 0 using Secure Copy Protocol (SCP). Use the IP address configured for the node 0 fxp0 interface in Step 5.
root@node1>scp /var/tmp/image-file root@node0-fxp0-ip:/var/tmp/
root@node1>scp /var/tmp/cfg-node1 root@node0-fxp0-ip:/var/tmp/
Update the Junos OS image on the Routing Engine to the required version:
root@node1>request system software add /var/tmp/junos-release-domestic.tgz reboot
The device reboots and comes up with the intended Junos OS version.
(Optional) Copy the scripts that you backed up in Step 3 from node 1:
root@node1>scp /var/db/scripts/op/op-script.slax root@node0-fxp0-ip: /var/db/scripts/op/
(Optional) Apply the licenses that you backed up in Step 3:
root@node0> request system license add terminal
Load the configuration:
root@node0> load override /var/tmp/cfg-node1
root@node0> commit
Verify that the configuration commits without any error.
- Check the status of all the FPCs and PICs, and ensure
that all the FPCs and PICs are online:
root@node0>show chassis fpc pic-status
- Halt node 0 from the console:
root@node0>request system halt
- Wait until a message appears on the console confirming that the services have stopped, and then connect all the disconnected cables.
- Boot up node 0 by pressing any key on the console.
- Check the chassis cluster status on node 1:
root@node1>show chassis cluster status
Node 0 comes up and becomes the secondary node on both RG0 and RG1. Wait until the node 0 priority on RG1 changes to the configured value.
- Verify that sessions are showing up on node 0, and that
the number of sessions on node 0 is nearly equal to the number of
sessions on the primary node, node 1:
root@node1>show security monitoring
- If the cluster is healthy, reset the cluster priorities:
root@node1>request chassis cluster failover reset redundancy-group 1
root@node1>request chassis cluster failover reset redundancy-group 0