Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Upgrading Both Devices in a Chassis Cluster Using a Low-Impact ISSU

    Upgrading Both Devices in a Chassis Cluster Using an ISSU

    The chassis cluster ISSU feature allows both devices in a cluster to be upgraded from supported Junos OS versions with a traffic impact similar to that of redundancy group failovers.

    Before you begin, note the following guidelines:

    • Back up the software using the request system snapshot command on each Routing Engine to back up the system software to the device’s hard disk
    • If you are using Junos OS Release 11.4 or earlier, before starting an ISSU, fail over all redundancy groups so that they are all active on only one node (primary). See Initiating a Chassis Cluster Manual Redundancy Group Failover.

      If you are using Junos OS Release 12.1 or later, Junos OS software will automatically fail over all RGs to the RG0 primary.

    • We recommend that you enable graceful restart for routing protocols before you start an ISSU.

    Note: On all high-end SRX Series devices, the first recommended unified ISSU from release is Junos OS Release 10.4R4.

    Starting with Junos OS Release 15.1X49-D70, SRX1500 devices support ISSU.

    To perform an ISSU from the CLI:

    1. Download the software package from the Juniper Networks Support website.
    2. Copy the package on both nodes of the cluster. We recommend that you copy it to the /var/tmp directory, which is a large file system on the hard disk. Note that the node from where you initiate the ISSU must have the software image.

      user@host>file copy ftp://username:prompt@ftp.hostname.net/filename /var/tmp/filename

    3. Verify the current software version running on both nodes. On the primary node, issue the show version command.
    4. Start the ISSU from the node that is primary for all the redundancy groups by entering the following command:
      user@host> request system software in-service-upgrade image-name-with-full-path reboot

      Note: For SRX5400, SRX5600, and SRX5800 devices, you must include reboot in the command. If reboot is not included, the command fails.

      Wait for both nodes to complete the upgrade (you are logged out of the device).

    5. Wait a few minutes, and then log in to the device again. Verify that both devices in the cluster are running the new Junos OS build using the show version command.
    6. Verify that all policies, zones, redundancy groups, and other RTOs return to their correct states.
    7. Make node 0 the primary node again by issuing the request chassis cluster failover node node-number redundancy-group group-number command.

    Note: If you want redundancy groups to automatically return to node 0 as the primary after the ISSU is complete, you must set the redundancy group priority such that node 0 is primary and enable the preempt option. Note that this method works for all redundancy groups except redundancy group 0. You must manually fail over redundancy group 0.

    To set the redundancy group priority and enable the preempt option, see Example: Configuring Chassis Cluster Redundancy Groups.

    To manually fail over a redundancy group, see Initiating a Chassis Cluster Manual Redundancy Group Failover.

    Note: During the upgrade, both devices might experience redundancy group failovers, but traffic is not disrupted. Each device validates the package and checks version compatibility before doing the upgrade. If the system finds that the new package is not version compatible with the currently installed version, the device refuses the upgrade or prompts you to take corrective action. Sometimes a single feature is not compatible, in which case the upgrade software prompts you to either abort the upgrade or turn off the feature before doing the upgrade.

    This feature is available through the command-line interface. See request system software in-service-upgrade (Maintenance).

    Rolling Back Devices in a Chassis Cluster After an ISSU

    If the ISSU fails to complete and only one device in the cluster has been upgraded, you can roll back to the previous configuration on that device alone by using the following commands on the upgraded device:

    • request chassis cluster in-service-upgrade abort
    • request system software rollback node node-id
    • request system reboot

    Enabling an Automatic Chassis Cluster Node Failback After an ISSU

    If you want redundancy groups to automatically return to node 0 as the primary after the ISSU is complete, you must set the redundancy group priority such that node 0 is primary and enable the preempt option. Note that this method works for all redundancy groups except redundancy group 0. You must manually fail over redundancy group 0. To set the redundancy group priority and enable the preempt option, see Example: Configuring Chassis Cluster Redundancy Groups. To manually fail over a redundancy group, see Initiating a Chassis Cluster Manual Redundancy Group Failover.

    Note: To upgrade node 0 and make it available in the chassis cluster, manually reboot node 0. Node 0 does not reboot automatically.

    Troubleshooting Chassis Cluster ISSU-Related Problems

    This topic includes the following sections:

    Viewing the ISSU Progress

    Problem

    Description: Rather than wait for an ISSU failure, you can display the progress of the ISSU as it occurs, noting any message where the ISSU was unsuccessful. Providing that message to TAC can help resolve the issue.

    Solution

    After starting an ISSU, issue the show chassis cluster information issu command. Output similar to the following is sent to the console to indicate the progress of the ISSU for all Services Processing Units (SPUs).

    Note: Any management session to secondary node will be disconnected.
    Shutdown NOW!
    [pid 2480]
    ISSU: Backup RE Prepare Done
    Waiting for node1 to reboot.
    Current time: Tue Apr 22 14:37:32 2014
    Max. time to complete: 15min 0sec.
    Note: For real time ISSU status, open a new management session and run 
    <show chassis cluster information issu> for detail information
    node1 booted up.
    Waiting for node1 to become secondary
    Current time: Tue Apr 22 14:40:32 2014
    Max. time to complete: 60min 0sec.
    Note: For real time ISSU status, open a new management session and run
    <show chassis cluster information issu> for detail information
    node1 became secondary.
    Waiting for node1 to be ready for failover
    ISSU: Preparing Daemons
    Current time: Tue Apr 22 14:41:27 2014
    Max. time to complete: 60min 0sec.
    Note: For real time ISSU status, open a new management session and run 
    <show chassis cluster information issu> for detail information 
    Secondary node1 ready for failover.
    Installing package '/var/tmp/junos-srx5000-12.1I20140421_srx_12q1_x47.0-643920-domestic.tgz' ...
    Verified SHA1 checksum of issu-indb.tgz
    Verified junos-boot-srx5000-12.1I20140421_srx_12q1_x47.0-643920.tgz signed by PackageDevelopment_12_1_0
    Verified junos-srx5000-12.1I20140421_srx_12q1_x47.0-643920-domestic signed by PackageDevelopment_12_1_0
    

    Stopping ISSU Process When it Halts During an Upgrade

    Problem

    Description: The ISSU process halts in the middle of an upgrade.

    Solution

    If the ISSU fails to complete and only one device in the cluster has been upgraded, you can roll back to the previous OS on that device alone by using the following commands on the upgraded device:

    • Abort ISSU on both nodes using the request chassis cluster in-service-upgrade abort command.
    • Rollback the image using the request system software rollback command with node option.
    • Reboot the rolled back node using the request system reboot command.

    Recovering the Node in Case of a Failed ISSU

    Problem

    Description: The ISSU procedure stops progressing.

    Solution

    Open a new session on the primary device and issue the request chassis cluster in-service-upgrade abort command.

    This step aborts an in-progress ISSU . This command must be issued from a session other than the one on which you issued the request system in-service-upgrade command that launched the ISSU. If the node is being upgraded, this command cancels the upgrade. The command is also helpful in recovering the node in case of a failed ISSU.

    When an ISSU encounters an unexpected situation that necessitates an abort, the system message provides you with detailed information about when and why the upgrade stopped and recommendations for the next steps to take.

    For example, the following message is issued when a node fails to become RG-0 secondary when it boots up:

    Rebooting Secondary Node
    Shutdown NOW!
    [pid 2120]
    ISSU: Backup RE Prepare Done
    Waiting for node1 to reboot.
    node1 booted up.
    Waiting for node1 to become secondary
    error: wait for node1 to become secondary failed (error-code: 5.1)
    ISSU aborted. But, both nodes are in ISSU window.
    Please do the following:
    1. Log on to the upgraded node.
    2. Rollback the image using rollback command with node option
    Note: Not using the 'node' option might cause
    the images on both nodes to be rolled back
    3. Make sure that both nodes (will) have the same image
    4. Ensure the node with older image is primary for all RGs
    5. Abort ISSU on both nodes
    6. Reboot the rolled back node
    {primary:node0}

    Note: If you attempt to upgrade a device pair running a Junos OS image earlier than Release 9.6, the ISSU will fail without changing anything about either device in the cluster. Devices running Junos OS releases earlier than 9.6 must be upgraded separately using individual device upgrade procedures.

    If the secondary device experiences a power-off condition before it boots up using the new image specified when the ISSU is initiated, when power is restored the newly upgraded device will still be waiting to end the ISSU. To end the ISSU, issue the request chassis cluster in-service-upgrade abort command.

    Deciphering Mismatched Control Link Statistics During a Chassis Cluster ISSU

    When using dual control links (supported on the SRX5000 and SRX3000 lines only), mismatched control link statistics might be reported with the show chassis cluster statistics and show chassis cluster control-plane statistics commands while you run an ISSU with nodes on devices running different releases. (ISSUs are available in Junos OS Release 9.6 and later and dual control links are available in Junos OS Release 10.0 and later.) For example, assume that one node on a device is running Junos OS Release 9.6 and another node on a device is running Junos OS Release 10.0. In this example, a mismatch might occur because the latter device is sending heartbeats on both control links, but the other device is receiving heartbeats only on one control link.

    Modified: 2011-12-14