Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Upgrading Two-Member QFX Series Virtual Chassis

About This Network Configuration Example

This network configuration example (NCE) shows how to upgrade a two-member QFX Series Virtual Chassis when the nonstop software upgrade (NSSU) process is either not available or undesirable. This process minimizes service disruption and has minimal impact on data center workloads. The NSSU feature for the QFX Series is supported between specific releases that can be found in the QFX Series section of the Junos Release Notes.

Use Case Overview

The Virtual Chassis capabilities are important aspects of the QFX Series portfolio. A common Virtual Chassis use case in data centers is aggregating multiple top-of-rack switches into a single logical entity for simplicity in management and operations of high-availability pairs. In this use case, racks of servers are multihomed to two top-of-rack QFX Series switches. The switches are configured into a Virtual Chassis pair and provide resiliency to the network path if one of the QFX Series devices fails.

When these devices need software updates, you will generally use the NSSU capabilities of the Virtual Chassis to upgrade the devices. The NSSU upgrade selectively upgrades the Virtual Chassis member devices in an intelligent order to minimize service disruption to the connected servers.

However, there are certain upgrade scenarios where the “from” release and “to” release do not support the NSSU upgrade process. When upgrading in these scenarios, we can achieve a similar result through a series of manual operations. This use case covers the non-NSSU upgrade path between two releases.

Technical Overview

The process to manually upgrade a two-member Virtual Chassis closely mimics the steps taken by the automated NSSU process. The sequence leverages the high-availability design to systematically remove one device from service to perform the upgrade and reboot. When the server nodes are dual homed to each of the devices, the network can withstand the removal of one of the Virtual Chassis members during the upgrade window. There is a reduction of overall network bandwidth during the process, but the network remains available.

The Virtual Chassis feature uses a primary/backup concept to keep the device state synchronized between the members of the Virtual Chassis. While one device handles the traffic, we take the other device offline and upgrade it. To upgrade both devices, we take the following steps:

  1. First, we shift all traffic to the primary device.

  2. Once the backup device is no longer handling server traffic, we break apart the Virtual Chassis.

  3. With the backup device completely isolated, we upgrade the software on the backup device and reboot it. The backup device will keep a copy of the original network configuration.

  4. After the upgraded backup comes online, we shift server traffic from the primary device to the backup device. Once the backup is handling the network load, we upgrade and reboot the primary device.

  5. After the primary device comes online, we shift traffic back to the primary device.

  6. Finally, we re-enable the Virtual Chassis links between the two devices to re-create the Virtual Chassis pair running the new software version.

Configuration Example

This configuration example shows how to upgrade a two-member Virtual Chassis from Junos OS Release 14.1X53-D49.1 to Junos OS Release 18.1R2.6. As it happens, this is not a supported combination for the NSSU feature, so we will use the manual process outlined below.

This example uses a basic Virtual Chassis configuration, but the process here is adaptable to a number of different use cases.

Requirements

Use this procedure to upgrade both members of a two-member Virtual Chassis consisting of QFX5100, QFX5110, QFX5220, or QFX5200 switches to the same Junos OS Release version. We strongly recommend that both members of the Virtual Chassis are the same platform, like in this example.

Before you begin:

  • If the Virtual Chassis is not preprovisioned, configure one member to be the primary Routing Engine and the other to be a backup Routing Engine

  • Make sure the Virtual Chassis is comprised of two members

  • Configure the Virtual Chassis in Virtual Chassis mode (that is, not Virtual Chassis Fabric mode)

  • Make sure the Virtual Chassis is performing Layer 2 functions only (that is, no IRBs or routing protocols)

This example uses the following hardware and software components:

  • Two QFX5100-48S-6Q devices running Junos OS Release 14.1X53-D49.1

  • Junos OS Release 18.1R2.6

  • Test server running Ubuntu Linux 16.04

Overview

The upgrade between releases requires a specific sequence of steps coordinated among the network elements to ensure a minimum of downtime during the transition. As indicated in the diagram, the general procedure will leverage the high availability characteristics of modern servers with redundant connections to the Virtual Chassis during the transition.

At the start of the upgrade, we begin with a functional two-member Virtual Chassis. Our goal is to upgrade to a new Junos OS release with minimum traffic disruption. To achieve this, we will break apart the Virtual Chassis and upgrade the member devices as standalone units. After the devices have been upgraded, we will re-connect them and re-establish the Virtual Chassis.

Topology

Network topology diagram showing a demo network with a core device device1.example.com VLAN10 IP 192.168.10.1 connected to switches S1 ad-qfx5100-a and S2 ad-qfx5100-b via aggregated Ethernet ae0 and ae1. Demo server device2.example.com VLAN10 IP 192.168.10.100 connects via bond0. Illustrates redundancy with master and backup switch for high availability.

Configuration

Procedure

Step-by-Step Procedure

To upgrade the devices:

  1. Verify the Virtual Chassis state. Check the parameters of the Virtual Chassis and verify you are working with a two-member Virtual Chassis that is operational.

  2. Upload the new software to the Virtual Chassis members. Copy the new software to /var/tmp on the Virtual Chassis primary and backup devices. This step stages software on both switches for the upgrade procedure. The copy operation will take some time to complete while it transfers the Junos OS images.

  3. We recommend disabling split detection whenever you form a Virtual Chassis with only two members. If you do not disable split detection, the primary device may take on a linecard role and stop the control and data planes when you disable the backup Routing Engine later in this example.

    Since you started this NCE with a fully configured Virtual Chassis, this option should already be configured. If it is not for any reason, configure it now.

  4. Disable server-facing ports on the backup Routing Engine to minimize disruption during switchover.

    Network topology diagram showing a Demo Network Core Device setup with two Juniper QFX5100 switches labeled Master RE and Backup RE connected via aggregated Ethernet links ae0 and ae1. A server node with management IP 10.92.71.11 and hostname device2.example.com is linked to the switches via interfaces enp8s0f0 and enp8s0f1. Disabled links marked with red X symbols.
  5. Disable VCP ports toward the backup Routing Engine. This breaks up the Virtual Chassis.

    Network topology diagram for Demo Network Core Device setup featuring two switches, S1 as Master RE and S2 as Backup RE, connected to a server node. Aggregated Ethernet links ae0 and ae1 connect switches, with bond0 linking to the server. Red X marks indicate disabled links. Server node, labeled Demo Server Node, connects via enp8s0f0 and enp8s0f1 interfaces. IP addresses include management IP 10.92.71.11 for the server and 10.92.71.93 for the core device, with VLAN10 IPs 192.168.10.100 and 192.168.10.1.
  6. Upgrade the backup Routing Engine. When upgrading to a 18.2 or newer Junos release you should include the force-host option. This ensuires that both the host OS and the Junos binaries are updated and remain matched.

    Network topology diagram with two switches connected via aggregated Ethernet links. S1 is the master routing engine switch ad-qfx5100-a and S2 is the backup routing engine switch ad-qfx5100-b. Broken connections marked with red Xs. A demo server node is connected with bond0 using interfaces enp8s0f0 and enp8s0f1. Demo server management IP is 10.92.71.11 and VLAN10 IP is 192.168.10.100. Core device management IP is 10.92.71.93 and VLAN10 IP is 192.168.10.1.
  7. Swap the server-facing ports by disabling the server-facing ports on the primary device and re-enabling the server-facing ports on the backup simultaneously. Implement the same configuration on the backup and primary devices to modify any configuration left over from when the two devices were part of the Virtual Chassis.

    On the backup QFX, first disable the server-facing ports on the primary device. Do not commit the configuration:

    Then re-enable the server-facing ports on the backup by deleting the previous configuration. Commit the configuration:

    Repeat the configuration on the primary QFX:

    Network topology diagram for Demo Network Core Device featuring Juniper QFX5100 switches S1 Master RE and S2 Backup RE with VCP links. Red X marks indicate broken links. Server Node IP 10.92.71.11 connected via bond0 interfaces. Core device IP 10.92.71.93 with VLAN10 IP 192.168.10.1. VME0 on S1 has IP 10.92.71.241/23.
  8. Upgrade the primary Routing Engine. When upgrading to a 18.2 or newer Junos release you should include the force-host option. This ensuires that both the host OS and the Junos binaries are updated and remain matched.

    Network topology diagram showing a demo setup with two interconnected switches, `ad-qfx5100-a` and `ad-qfx5100-b`, and a server connected via bonded interfaces. The switches use aggregated Ethernet interfaces for redundancy. Management IPs `10.92.71.93` and `10.92.71.11` are assigned to the core device and server, respectively. VLAN 10 is configured on both devices. Red X marks indicate broken links requiring troubleshooting.
  9. Note:

    Follow this step only if the virtual chassis I not pre-provisioned. If the virtual Chassis is pre-provisioned membership election is based on system uptime in the event that the primary routing engine is not pre-configured.

  10. Swap the server facing ports back to the primary device. Re-enable the server-facing ports on the primary device to speed up LACP convergence when the Virtual Chassis comes back. Implement the same configuration on the backup and primary devices to modify any configuration left over from when the two devices were part of the Virtual Chassis.

    On the backup QFX, first re-enable the server-facing ports on the primary device by deleting the previous configuration. Do not commit the configuration:

    Then disable the server-facing ports on the backup and commit the configuration:

    Repeat the configuration on the primary QFX:

    Network topology diagram for a Demo Network Core Device setup with core switches S1 and S2, showing connections ae0, ae1, and bond0 to a Demo Server Node. Red X marks indicate broken links. Core device has management IP 10.92.71.93 and VLAN10 IP 192.168.10.1; server node has management IP 10.92.71.11 and VLAN10 IP 192.168.10.100.
  11. Re-enable the VCP ports on both boxes to re-establish the Virtual Chassis.

    Network diagram of a demo network core setup with two switches, S1 and S2, showing redundancy. S1 is Master RE and S2 is Backup RE. Key IPs: Device1 10.92.71.93, VLAN10 192.168.10.1; Device2 10.92.71.11, VLAN10 192.168.10.100. S1 and S2 have interfaces ge-0/0/0, ge-0/0/1, ge-1/0/0, ge-1/0/1; ge-1/0/1 on S2 has a red cross. VCPs vcp-0/0/48 link S1 and S2; S2's connection has a red cross. Server connects via bond0 to enp8s0f0, enp8s0f1, linking to ae0. Red crosses indicate connection issues.
  12. Verify you have re-established the Virtual Chassis.

  13. Enable access ports on both members. Now that the Virtual Chassis has been re-established, we need to re-establish the access ports so we can use the primary Routing Engine em0 address to communicate with the newly upgraded Virtual Chassis.

    On the primary QFX:

    Network topology diagram showcasing redundant setup with 2 Juniper QFX5100 switches in Virtual Chassis, connected via ae0 and ae1. Demo server node uses bond0 for redundancy, with management IP 10.92.71.11 and VLAN10 IP 192.168.10.100. Core device IP is 10.92.71.93, VLAN10 IP 192.168.10.1.
    Note:

    If you intend to add more devices to your two-member Virtual Chassis, re-enable split detection.

    You have upgraded your two-member Virtual Chassis.

Conclusion

Virtual Chassis is an important architecture design for datacenter high availability. You now know how to manually upgrade a two-member QFX Series Virtual Chassis with minimal impact to your datacenter workloads. Use the procedure outlined in this document to upgrade any Virtual Chassis with a similar topology when NSSU is not available or not desirable.