Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding Nonstop Software Upgrade for QFabric Systems

The framework that underlies a nonstop software upgrade in a QFabric system enables you to upgrade the system in a step-by-step manner and minimize the impact to the continuous operation of the system. This topic explains how a nonstop software upgrade works in a QFabric system, the steps that are involved, and the procedures that you need to implement to experience the benefits of this style of software upgrade.

Nonstop software upgrade enables some QFabric system components to continue operating while similar components in the system are being upgraded. In general, the QFabric system upgrades redundant components in stages so that some components remain operational and continue forwarding traffic while their equivalent counterparts upgrade to a new version of software.

Tip:

Use the following guidelines to decide when to implement a nonstop software upgrade:

Before you perform a nonstop software upgrade, contact JTAC to perform a pre-upgrade health check on the QFabric system.

  • If you need to upgrade all components of the system in the shortest amount of time (approximately one hour) and you do not need to retain the forwarding resiliency of the data plane, issue the request system software add component all command to perform a standard software upgrade. All components of the QFabric system upgrade simultaneously and expediently, but this type of upgrade does not provide resiliency or switchover capabilities.

  • If you need to minimize service impact, preserve the forwarding operations of the data plane during the upgrade, and are willing to take the extra time required for component switchovers (in many cases, several hours), issue the three nonstop software upgrade commands (request system software nonstop-upgrade (director-group | fabric | node-group) described in this topic in the correct order.

Note:
  • Before you begin a nonstop software upgrade, issue the request system software download command to copy the software to the QFabric system.

  • Each of the 3 nonstop software upgrade steps must be considered parts of the whole process. You must complete all 3 steps of a nonstop software upgrade in the correct order to ensure the proper operation of the QFabric system.

  • Open two SSH sessions to the QFabric CLI. Use one session to monitor the upgrade itself and use a second session to verify that the QFabric system components respond to operational mode commands as expected. For more information on verification of the upgrade, see Verifying Nonstop Software Upgrade for QFabric Systems.

  • Issue the show fabric administration inventory command to verify that all upgraded components are operational at the end of a step before beginning the next step.

  • Once you start the nonstop software upgrade process, we strongly recommend that you complete all 3 steps within 12 hours.

The three steps to a successful nonstop software upgrade must be performed in the following order:

  • Director group—The first step upgrades the Director devices, the fabric manager Routing Engine, and the diagnostic Routing Engine. To perform the first step, issue the request system software nonstop-upgrade director-group command. The key actions that occur during a Director group upgrade are:

    1. Connecting to the QFabric system by way of an SSH connection. This action establishes a load-balanced CLI session on one of the Director devices in the Director group.

    2. The QFabric system downloads and installs the new software in both Director devices.

    3. The Director device hosting the CLI session becomes the primary for all QFabric system processes running on the Director group, such as the fabric manager and network Node group Routing Engines.

    4. The QFabric system installs the new software for the backup fabric manager Routing Engine on the backup Director device.

    5. The backup Director device reboots to activate the new software.

    6. The primary Director device begins a 15 minute sequence that includes a temporary suspension of QFabric services and a QFabric database transfer. You cannot issue operational mode commands in the QFabric CLI during this period.

    7. The QFabric system installs the new software for the fabric manager and diagnostic Routing Engines on the Director group primary.

    8. The QFabric system switches primary role of all QFabric processes from the primary Director device to the backup Director device.

    9. The primary Director device reboots to activate the new software.

    10. The CLI session terminates, and logging back in to the QFabric system with a new SSH connection establishes the session on the new primary Director device (the original backup).

    11. The previous primary Director device resumes operation as a backup and the associated processes (such as the fabric manager and network Node group Routing Engines) become backup as well. The fabric control Routing Engine associated with this Director device returns to active status.

    Note:

    After the Director group nonstop software upgrade completes, any Interconnect device or Node device that reboots will automatically download the new software, install it, and reboot again. As a result, try not to restart any QFabric system devices before you complete the rest of the nonstop software upgrade steps.

    Tip:
    • To enable BGP and OSPF to continue operating on the network Node group during a Director group nonstop service upgrade, we recommend that you configure graceful restart for these routing protocols. For more information on graceful restart, see Configuring Graceful Restart for QFabric Systems.

    • Wait 15 minutes after the second Director device returns to service and hosts Routing Engine processes before proceeding to step 2—the fabric upgrade. You can verify the operational status of both Director devices by issuing the show fabric administration inventory director-group status command. Also, issue the show fabric administration inventory infrastructure command to verify when the Routing Engine processes become load balanced (typically, there will be three to four Routing Engines running on each Director device).

  • Fabric—The second step upgrades the Interconnect devices and the fabric control Routing Engines. To perform the second step, issue the request system software nonstop-upgrade fabric command. The key actions that occur during a fabric upgrade are:

    1. The QFabric system downloads, validates, and installs the new software in all Interconnect devices and fabric control Routing Engines (FC-0 and FC-1).

    2. One fabric control Routing Engine reboots and comes back online.

    3. The other fabric control Routing Engine reboots and comes back online.

    4. The first Interconnect device reboots, comes back online, and resumes the forwarding of traffic.

    5. Subsequent Interconnect devices reboot one at a time, come back online, and return to service.

    Note:
    • If the software does not load properly on any one of the fabric components, all components revert back to the original software version.

    • If one of the components in a fabric upgrade does not reboot successfully, issue the request system reboot fabric command to reattempt the rebooting process for this fabric component and activate the new software.

  • Node group—The third and final step upgrades Node groups. You can choose to upgrade a network Node group, a redundant server Node group, or individual server Node groups. You can upgrade the Node groups one at a time or in groups (known as upgrade groups). However, you must upgrade all Node groups in your QFabric system before you can complete the nonstop software upgrade process. To perform the third step, issue the request system software nonstop-upgrade node-group command.

    The key actions that occur during a network Node group upgrade are:

    1. The QFabric system copies the new software to each Node device one at a time.

    2. The QFabric system validates and then installs the new software in all Node devices simultaneously.

    3. The system copies the software to the network Node group Routing Engines.

    4. The QFabric system validates and then installs the software in the network Node group Routing Engines one at a time -- first the backup, then the primary.

    5. The backup network Node group Routing Engine reboots and comes back online.

    6. The supporting Node devices reboot and come back online one at a time.

      Note:

      To reduce the total upgrade duration, configure an upgrade group. All Node devices within the upgrade group reboot at the same time.

    7. The primary network Node group Routing Engine relinquishes primary role to the backup, reboots, and comes back online.

    The key actions that occur during a redundant server Node group upgrade are:

    1. The QFabric system copies the new software to the backup Node device, then the primary Node device.

    2. The QFabric system validates and then installs the new software on the backup Node device, then the primary Node device.

    3. The backup Node device reboots, comes back online, and becomes the primary Node device.

    4. The previous primary Node device reboots and comes back online as a backup Node device.

    Note:

    For redundant server Node groups, both Node devices must be online before the upgrade will proceed. If one of the devices is no longer available, remove the Node device from the Node group configuration before you issue the nonstop software upgrade command.

    The key actions that occur during a server Node group upgrade for a Node group that contains one member are:

    1. The Node device downloads the software package and validates the software.

    2. The Node device installs the software and reboots.

Note:

Because there is no redundancy for Node groups containing a single Node device, traffic loss occurs when the device reboots during the upgrade.