Understanding Nonstop Software Upgrade on a Virtual Chassis Fabric
With nonstop software upgrade (NSSU), you can upgrade the software running on all member switches in a Virtual Chassis Fabric (VCF) with minimal network traffic disruption during the upgrade. You can use NSSU with a VCF as follows:
NSSU is supported in a non-mixed or mixed mode QFX5100 VCF with up to 20 members.
For minimal traffic disruption, you must configure link aggregation groups (LAGs) such that the member links of each LAG reside on different VCF members. When one member link of a LAG is down, the remaining links are up, and traffic continues to flow through the LAG.
Because NSSU upgrades the software on each VCF member one at a time, an upgrade using NSSU can take longer than an upgrade using the request system software add command.
You can reduce the amount of time an upgrade takes by configuring NSSU line-card upgrade groups. The members of a Virtual Chassis or VCF in an upgrade group are upgraded simultaneously. See Configuring Line-Card Upgrade Groups for Nonstop Software Upgrade.
Benefits of NSSU
No disruption to the control plane—NSSU uses graceful Routing Engine switchover (GRES) and nonstop active routing (NSR) to ensure no disruption occurs to the control plane. During the upgrade process, the VCF preserves interface, kernel, and routing protocol information.
Minimal disruption to network traffic—NSSU minimizes network traffic disruption by upgrading member switches one at a time, enabling the primary and backup members to maintain their primary and backup roles (although the primary role will change) without disrupting traffic, and permitting traffic to continue to flow through members in the linecard role that are not being upgraded.
Requirements for Performing an NSSU for a VCF
You must configure the following in the VCF before requesting an NSSU operation:
Graceful Routing Engine switchover (GRES).
Nonstop active routing (NSR) and nonstop bridging (NSB).
We recommend enabling NSB when you set up a VCF with any provisioning mode—preprovisioned, auto-provisioned, or non-provisioned—to avoid losing Layer 2 control protocol adjacency during a Routing Engine switchover.
Also, when you enable NSR and NSB, commit the configuration using the commit synchronize CLI command.
For minimal traffic disruption, define link aggregation groups (LAGs) such that the member LAG links reside on different VCF members.
The following conditions are also required for successful NSSU operation:
Interconnect the VCF members in a spine-and-leaf topology, with each leaf device connected to all of the configured spine devices. This topology prevents the VCF from splitting during an NSSU operation.
You must also configure no-split-detection in a two-member VCF so that the VCF doesn’t split when NSSU upgrades a member.
Use preprovisioning when you initially set up the VCF, so that you explicitly assign the Routing Engine role or line-card role to the member switches acting in each of those roles.
During an NSSU, the VCF members must maintain their roles—the primary and backup must maintain their primary and backup roles (although the primary role will change) and those member switches must remain in their Routing Engine roles. The remaining switches must maintain their line-card roles.
You can have only two members in the Routing Engine role in the preprovisioned configuration. The NSSU process checks the member configuration, displays a warning message if it detects that you configured more than two switches in the Routing Engine role, and stops the upgrade.
How NSSU Works for a VCF
When you request an NSSU on a VCF:
- The VCF primary verifies that:
The backup is online.
You have enabled Graceful Routing Engine switchover (GRES), nonstop active routing (NSR), and nonstop bridging (NSB).
The VCF has a preprovisioned configuration with only 2 members in the Routing Engine role.
- The primary transfers the new software image to the backup
and remaining line-card role members in sequence using
Starting with Junos OS Release 14.1X53-D40, to optimize the time needed to complete an NSSU operation for a VCF, the primary uses parallel
rcpsessions to copy the new software to multiple members at a time (rather than waiting for the copy operation to complete to each member before starting to copy the software image to the next member). The number of parallel copy operations is determined by a default algorithm based on the number of members in the VCF, or on a QFX5100 VCF you can configure a specific number using the rcp-count configuration statement. See rcp-count for details.
If copying the new software to any line-card role member fails, NSSU terminates the upgrade process for the entire VCF without rebooting any members, and logs the error condition. Starting with Junos OS Release 14.1X53-D40, after an NSSU copy of the new software image to a member fails, the primary performs an additional error recovery measure to remove the new software from the members to which it was already transferred.
- The primary restarts the backup with the new software, and the backup resynchronizes with the primary.
- The primary loads and reboots member switches that are
in the line-card role, one at a time. The primary waits for each member
to become online and active running the new software before rebooting
the next member.
If you configured upgrade groups, the VCF member or members in the first upgrade group load the new image and restart. When the members in that upgrade group are online again, the members in the next upgrade group load the new image and restart.
Traffic continues to flow through the other members during this process.
- Rebooting continues until all active members have restarted
with the new software.
If any member fails to reboot successfully (including initial reboot of the backup), NSSU terminates the upgrade process and logs the error condition. In this case, to avoid VCF instability, you should either back out the partial upgrade by restoring the old software and rebooting the members that were already rebooted with the new software, or try to manually reboot all members with the new software that was copied to them, so all members come online again running the same version of the software.
Starting with Junos OS Release 14.1X53-D40, NSSU automatically invokes recovery measures if the reboot fails on any line-card role member, stopping the sequential reboot process and bringing down and rebooting the entire VCF. This action cleanly brings up all members at the same time running the new software, which recovers stable VCF operation more quickly than having an unstable VCF running different versions of the software trying to converge.
- When all members that are in the line-card role have been upgraded, the primary performs a graceful Routing Engine switchover, and the upgraded backup becomes the primary.
- The software on the original primary is upgraded and the original primary is automatically rebooted. After the original primary has rejoined the VCF, you can optionally return control to it by requesting a graceful Routing Engine switchover.
You cannot use an NSSU to downgrade the software—that is, to install an earlier version of the software than is currently running on the switch. To install an earlier software version, use the request system software add command.
You cannot roll back to the previous software version after you perform an upgrade using NSSU. If you need to roll back to the previous software version, you can do so by rebooting from the alternate root partition if you have not already copied the new software version into the alternate root partition.
NSSU and Junos OS Release Support
NSSU is supported on a QFX5100 VCF with up to 20 member switches in Junos OS Release 13.2X51-D20 or later.
You can’t use NSSU to upgrade a QFX5100 VCF from a Junos OS “-qfx-5-” image (see the package filename) to a “-qfx-5e-” image. You must first upgrade all of the QFX5100 switches to a “-qfx-5e-” image that supports NSSU. Then you can use NSSU to upgrade the VCF to a later “-qfx-5e-” Junos OS release for supported from and to release combinations. See Upgrading a QFX5100 Switch with a USB Device to Join a QFX5110 Virtual Chassis or Virtual Chassis Fabric.
A VCF must first be running a Junos OS release that supports NSSU before you can perform an NSSU to upgrade it to a later release.
If a VCF is running a software version that does not support NSSU, you can upgrade the VCF using the standard CLI command to update software, request system software add. See Upgrading Virtual Chassis Fabric Software Using Automatic or Standard Software Update Features, which requires VCF downtime, or the network configuration example How to Upgrade a Four-Member QFX Series VCF, which minimizes service disruption during the upgrade process.
NSSU works only on VCFs with particular from and to Junos OS Releases. Contact Juniper Networks Technical Assistance Center(JTAC) to confirm supported from and to releases if you are considering upgrading your VCF using NSSU.
Overview of NSSU Configuration and Operation
You must ensure that the configuration meets the requirements described in Requirements for Performing an NSSU for a VCF. Running NSSU itself requires no additional configuration.
You initiate an NSSU by entering the request system software nonstop-upgrade CLI command. For detailed instructions on how to perform an NSSU, see the topics in Related Documentation.
rcpsessions to copy the new software to multiple members at a time (rather than waiting for the copy operation to complete to each member before starting to copy the software image to the next member).