Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Upgrading Two-Member QFX Series Virtual Chassis

About This Network Configuration Example

This network configuration example (NCE) shows how to upgrade a two-member QFX Series Virtual Chassis when the nonstop software upgrade (NSSU) process is either not available or undesirable. This process minimizes service disruption and has minimal impact on data center workloads. The NSSU feature for the QFX Series is supported between specific releases that can be found in the QFX Series section of the Junos Release Notes.

Use Case Overview

The Virtual Chassis capabilities are important aspects of the QFX Series portfolio. A common Virtual Chassis use case in data centers is aggregating multiple top-of-rack switches into a single logical entity for simplicity in management and operations of high-availability pairs. In this use case, racks of servers are multihomed to two top-of-rack QFX Series switches. The switches are configured into a Virtual Chassis pair and provide resiliency to the network path if one of the QFX Series devices fails.

When these devices need software updates, you will generally use the NSSU capabilities of the Virtual Chassis to upgrade the devices. The NSSU upgrade selectively upgrades the Virtual Chassis member devices in an intelligent order to minimize service disruption to the connected servers.

However, there are certain upgrade scenarios where the “from” release and “to” release do not support the NSSU upgrade process. When upgrading in these scenarios, we can achieve a similar result through a series of manual operations. This use case covers the non-NSSU upgrade path between two releases.

Technical Overview

The process to manually upgrade a two-member Virtual Chassis closely mimics the steps taken by the automated NSSU process. The sequence leverages the high-availability design to systematically remove one device from service to perform the upgrade and reboot. When the server nodes are dual homed to each of the devices, the network can withstand the removal of one of the Virtual Chassis members during the upgrade window. There is a reduction of overall network bandwidth during the process, but the network remains available.

The Virtual Chassis feature uses a primary/backup concept to keep the device state synchronized between the members of the Virtual Chassis. While one device handles the traffic, we take the other device offline and upgrade it. To upgrade both devices, we take the following steps:

  1. First, we shift all traffic to the primary device.

  2. Once the backup device is no longer handling server traffic, we break apart the Virtual Chassis.

  3. With the backup device completely isolated, we upgrade the software on the backup device and reboot it. The backup device will keep a copy of the original network configuration.

  4. After the upgraded backup comes online, we shift server traffic from the primary device to the backup device. Once the backup is handling the network load, we upgrade and reboot the primary device.

  5. After the primary device comes online, we shift traffic back to the primary device.

  6. Finally, we re-enable the Virtual Chassis links between the two devices to re-create the Virtual Chassis pair running the new software version.

Configuration Example

This configuration example shows how to upgrade a two-member Virtual Chassis from Junos OS Release 14.1X53-D49.1 to Junos OS Release 18.1R2.6. As it happens, this is not a supported combination for the NSSU feature, so we will use the manual process outlined below.

This example uses a basic Virtual Chassis configuration, but the process here is adaptable to a number of different use cases.

Requirements

Use this procedure to upgrade both members of a two-member Virtual Chassis consisting of QFX5100, QFX5110, QFX5220, or QFX5200 switches to the same Junos OS Release version. We strongly recommend that both members of the Virtual Chassis are the same platform, like in this example.

Before you begin:

  • If the Virtual Chassis is not preprovisioned, configure one member to be the primary Routing Engine and the other to be a backup Routing Engine

  • Make sure the Virtual Chassis is comprised of two members

  • Configure the Virtual Chassis in Virtual Chassis mode (that is, not Virtual Chassis Fabric mode)

  • Make sure the Virtual Chassis is performing Layer 2 functions only (that is, no IRBs or routing protocols)

This example uses the following hardware and software components:

  • Two QFX5100-48S-6Q devices running Junos OS Release 14.1X53-D49.1

  • Junos OS Release 18.1R2.6

  • Test server running Ubuntu Linux 16.04

Overview

The upgrade between releases requires a specific sequence of steps coordinated among the network elements to ensure a minimum of downtime during the transition. As indicated in the diagram, the general procedure will leverage the high availability characteristics of modern servers with redundant connections to the Virtual Chassis during the transition.

At the start of the upgrade, we begin with a functional two-member Virtual Chassis. Our goal is to upgrade to a new Junos OS release with minimum traffic disruption. To achieve this, we will break apart the Virtual Chassis and upgrade the member devices as standalone units. After the devices have been upgraded, we will re-connect them and re-establish the Virtual Chassis.

Topology

Configuration

Procedure

Step-by-Step Procedure

To upgrade the devices:

  1. Verify the Virtual Chassis state. Check the parameters of the Virtual Chassis and verify you are working with a two-member Virtual Chassis that is operational.

  2. Upload the new software to the Virtual Chassis members. Copy the new software to /var/tmp on the Virtual Chassis primary and backup devices. This step stages software on both switches for the upgrade procedure. The copy operation will take some time to complete while it transfers the Junos OS images.

  3. We recommend disabling split detection whenever you form a Virtual Chassis with only two members. If you do not disable split detection, the primary device may take on a linecard role and stop the control and data planes when you disable the backup Routing Engine later in this example.

    Since you started this NCE with a fully configured Virtual Chassis, this option should already be configured. If it is not for any reason, configure it now.

  4. Disable server-facing ports on the backup Routing Engine to minimize disruption during switchover.

  5. Disable VCP ports toward the backup Routing Engine. This breaks up the Virtual Chassis.

  6. Upgrade the backup Routing Engine. When upgrading to a 18.2 or newer Junos release you should include the force-host option. This ensuires that both the host OS and the Junos binaries are updated and remain matched.

  7. Swap the server-facing ports by disabling the server-facing ports on the primary device and re-enabling the server-facing ports on the backup simultaneously. Implement the same configuration on the backup and primary devices to modify any configuration left over from when the two devices were part of the Virtual Chassis.

    On the backup QFX, first disable the server-facing ports on the primary device. Do not commit the configuration:

    Then re-enable the server-facing ports on the backup by deleting the previous configuration. Commit the configuration:

    Repeat the configuration on the primary QFX:

  8. Upgrade the primary Routing Engine. When upgrading to a 18.2 or newer Junos release you should include the force-host option. This ensuires that both the host OS and the Junos binaries are updated and remain matched.

  9. Note:

    Follow this step only if the virtual chassis I not pre-provisioned. If the virtual Chassis is pre-provisioned membership election is based on system uptime in the event that the primary routing engine is not pre-configured.

  10. Swap the server facing ports back to the primary device. Re-enable the server-facing ports on the primary device to speed up LACP convergence when the Virtual Chassis comes back. Implement the same configuration on the backup and primary devices to modify any configuration left over from when the two devices were part of the Virtual Chassis.

    On the backup QFX, first re-enable the server-facing ports on the primary device by deleting the previous configuration. Do not commit the configuration:

    Then disable the server-facing ports on the backup and commit the configuration:

    Repeat the configuration on the primary QFX:

  11. Re-enable the VCP ports on both boxes to re-establish the Virtual Chassis.

  12. Verify you have re-established the Virtual Chassis.

  13. Enable access ports on both members. Now that the Virtual Chassis has been re-established, we need to re-establish the access ports so we can use the primary Routing Engine em0 address to communicate with the newly upgraded Virtual Chassis.

    On the primary QFX:

    Note:

    If you intend to add more devices to your two-member Virtual Chassis, re-enable split detection.

    You have upgraded your two-member Virtual Chassis.

Conclusion

Virtual Chassis is an important architecture design for datacenter high availability. You now know how to manually upgrade a two-member QFX Series Virtual Chassis with minimal impact to your datacenter workloads. Use the procedure outlined in this document to upgrade any Virtual Chassis with a similar topology when NSSU is not available or not desirable.