Software Upgrade in Multinode High Availability
Overview
SRX Series Firewalls deployed in an MNHA configuration can be upgraded with minimal
disruption by sequentially upgrading each device. Depending on your device architecture, use
one of the following CLI commands to initiate the Junos upgrade- request system
software add or request vmhost software add .
| From Junos OS Release | To Junos OS Release | Use Software Upgrade Method |
|---|---|---|
| 20.4 | Any release post 20.4 | No |
| 22.3 | Next version of Junos OS Release | Yes |
-
Releases 22.4R1 and later are not compatible with earlier Junos OS releases for synchronizing sessions during a regular upgrade. Use the Isolated Nodes Upgrade Procedurein such cases.
-
Upgrading from 22.3 to the next release may cause brief traffic disruption.
-
You may see
Peer Hardware Incompatible: SPU SLOT MISMATCHduring upgrades from 21.4R1 onwards. -
NAT sessions are not synced during interim upgrade stages in releases prior to 23.4R2.
-
Always upgrade both nodes to the same Junos OS version.
For information about upgrade and downgrade support for Junos OS releases, see Upgrade and Downgrade Support Policy for Junos OS Releases and Extended End-Of-Life Releases in Release Notes.
When you are upgrading SRX Series Firewalls in Multinode High Availability to Junos OS Release 22.4R1 or to a higher release, from an earlier Junos OS release, you can use the Isolated Nodes Upgrade Procedure. Junos OS Release 22.4R1 and higher releases are not compatible with earlier Junos OS releases for synchronizing sessions during a regular upgrade.
Before You Begin
Before performing an upgrade on an SRX Series device in MNHA) configuration, it is recommended to redirect the traffic away from the device in a controlled way. This can be done using one of the following methods:
-
Manual failover —Trigger a manual failover to shift traffic to the peer device.
-
Software upgrade mode —Temporarily configure the device with the following command:
user@host# set chassis high-availability software-upgrade
This command introduces a device failure with failure code SU (Software Upgrade). As a result, Services Redundancy Groups (SRG) 1 and above will transition to an Ineligible state (instead of Active or Backup) on the device being upgraded. This causes the associated traffic to automatically fail over to the other MNHA cluster member.
Note: If your MNHA cluster is configured with only SRG0 and includes theinstall-on-failure-routeoption, you can still redirect traffic by using theset chassis high-availability software-upgradeconfiguration to move traffic off the device gracefully.
Software Upgrade
Preparation Checklist
Consider the following best practices when you plan your software upgrade:
- Ensure both nodes are online and running the same Junos OS version. Check the current Junos OS software version on your device using the show version command.
- Verify storage availability:
show system storage - Check hardware status:
show chassis fpc pic-statusshow chassis alarms
- Ensure that there are no uncommitted changes.
- Backup configuration and license keys.
- Download the Junos OS image to /var/tmp on both devices.
- Ensure your high availability setup is healthy, functional, and that the interchassis
link (ICL) is up.
show chassis high-availability information - Prepare your SRX Series Firewalls for an upgrade using the checklist available in .
For details on preparing your device for an upgrade, see Preparing for Software Installation and Upgrade (Junos OS).
Download Software
Download the Junos OS image from the Juniper Networks Support page on both SRX Series Firewalls and save it in the /var/tmp location. Example:
user@host> request system software add /var/tmp/junos-install-vsrx3-x86-64-22.3R1.3.tgz no-copy
Upgrade Procedure
Follow the steps in this procedure to upgrade SRX Series devices configured in a Multinode High Availability (MNHA) setup. In this example, the cluster consists of two devices: srx-01 (currently active) and srx-02 (currently backup). The upgrade process begins with the backup node (srx-02), followed by the active node (srx-01), ensuring minimal service disruption.
Ensure your Multinode High Availability setup is healthy, functional, and that the interchassis link (ICL) is up.
On SRX-01 Device
user@srx-01> show chassis high-availability informationNode failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: ACTIVE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: N/A Failure Events: NONE Peer Information: Peer Id: 2 Status : BACKUP Health Status: HEALTHY Failover Readiness: READYOn SRX-02 Device
user@srx-02> show chassis high-availability informationNode failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 1 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: BACKUP Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 1 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A- Initiate the software upgrade process on the backup node (srx-02) and commit the
configuration
user@srx-02# set chassis high-availability software-upgrade
This command triggers a local failover for SRG0 and marks SRG1 (if present) as INELIGIBLE, allowing the peer node to take or retain the active role
- Verify the status of Multinode High Availability. The output shows Node Status:
OFFLINE [ SU ], which indicates that the node is ready for the software upgrade. You can
see that the status of the SRG1 has changed to INELIGIBLE.
user@srx-02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: IN PROGRESS Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A Confirm that the other device (srx-01) is in an active role and is functioning normally.
user@srx-01> show chassis high-availability informationNode failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 1 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: ACTIVE Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: N/A Failure Events: NONE Peer Information: Peer Id: 1 Status : INELIGIBLE Health Status: UNHEALTHY Failover Readiness: NOT READYThe command output shows that the status of SRG1 is ACTIVE.
Note that under the
Peer Informationsection of the SRG1, the status isINELIGIBLEwhich indicates that the other node is in ineligible state.- Install the Junos OS software on the SRX-02 device.
user@srx-02> request system software add /var/tmp/junos-install-vsrx3-x86-64-22.3R1.3.tgz no-copy
- Reboot the device using the
request system rebootcommand after successful installation. - Check the Junos OS version after
reboot.
user@srx-02> show versionHostname: srx-02 Model: vSRX Junos: 22.3R1.3The output confirms that the device is upgraded to the correct Junos OS version.
- Check status of the Multinode High Availability on the device.
user@srx-02> show chassis high-availability informationNode failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/AThe output continues to display the node status as
OFFLINE [ SU ]and SRG1 status asINELIGIBLE. - Remove the
software-upgradestatement and commit the configuration.user@srx-02# delete chassis high-availability software-upgrade
When you remove the
software-upgradestatement, the node failover state and any installed routes are cleared. Until this statement is removed, the node remains offline and all SRGs stay in the INELIGIBLE state. This effectively isolates the node from handling traffic during the upgrade, as long as the peer remains healthy. -
Check the Multinode High Availability status again to confirm that the device is online and the overall status is healthy and functioning.
user@srx02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: BACKUP Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: IN PROGRESS Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/AThe output shows
Node Status: ONLINEand SRG1 status asBACKUP, which indicates that the node is back online and is functioning normally in backup role. -
Check interfaces, routing protocols, routes advertised and so on to confirm that your setup is operating normally.
Now you can proceed to upgrade the other device (SRX-01) using the same procedure.
(Optional) In case if you face any issues and cannot complete the upgrade, you can roll
back the software on the device, and then reboot the system. Use the request
system software rollback command to restore the previously installed software
version.
Upgrade Software using install-on-failure-route
For setups using only SRG0 (without A/B state support), we recommend configuring the install-on-failure-route. This route can be referenced in route policies to advertise less preferred paths during software upgrade scenarios or node failures. In this method, you can divert the traffic by changing the route. Here, traffic can still go through the node and interface remains up.
-
Create a dedicated custom virtual router for the route used for diverting traffic during the upgrade.
set routing-instances MNHA-signal-routes instance-type virtual-router
- Configure the
install-on-failure-routestatement for SRG0. Here, you have configured the route with IP address 10.39.1.3 as the route to install when the node fails.set routing-instances MNHA-signal-routes instance-type virtual-router set chassis high-availability services-redundancy-group 0 install-on-failure-route 10.39.1.3 routing-instance MNHA-signal-routes set chassis high-availability services-redundancy-group 1 active-signal-route 10.39.1.1 routing-instance MNHA-signal-routes set chassis high-availability services-redundancy-group 1 backup-signal-route 10.39.1.2 routing-instance MNHA-signal-routes
The routing table installs the route mentioned in the statement when the node fails.
- Configure a matching routing policy and define a policy condition based on the existence
of routes. Here you include the route 10.39.1.3 as the route match condition for the
if-route-exists.set policy-options condition active_route_exists if-route-exists address-family inet 10.39.1.1/32 set policy-options condition active_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0 set policy-options condition backup_route_exists if-route-exists address-family inet 10.39.1.2/32 set policy-options condition backup_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0 set policy-options condition failure_route_exists if-route-exists address-family inet 10.39.1.3/32 set policy-options condition failure_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0
Create the policy statement to refer the condition as one of the matching term.
set policy-options policy-statement mnha-route-policy term 4 from protocol static set policy-options policy-statement mnha-route-policy term 4 from protocol direct set policy-options policy-statement mnha-route-policy term 4 from condition failure_route_exists set policy-options policy-statement mnha-route-policy term 4 then metric 100 set policy-options policy-statement mnha-route-policy term 4 then accept
- Initiate software upgrade as mentioned in previous steps (Software Upgrade).
Deprecated Method (shutdown-on-failure interface)
Starting in Junos OS Release 24.3R1 onwards, the shutdown-on-failure
functionality is deprecated— rather than immediately removed—to provide backward
compatibility and an opportunity to bring your configuration into compliance with the new
configuration. As a part of this change, the [set chassis high-availability
services-redundancy-group 0 shutdown-on-failure interface-name] configuration
statement deprecated.
Previously, traffic had to be diverted manually by shutting down interfaces. You can now use the software-upgrade command to keep the node offline and all SRGs in the INELIGIBLE state for the duration of the upgrade. This effectively isolates the node from handling traffic.
If you're using Junos OS 22.4 or earlier, we recommend using the legacy methods to divert traffic during the upgrade.