Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Understanding Routing Engine Redundancy on Juniper Networks Routers

    This topic contains the following sections:

    Routing Engine Redundancy Overview

    Redundant Routing Engines are two Routing Engines that are installed in the same routing platform. One functions as the master, while the other stands by as a backup should the master Routing Engine fail. On routing platforms with dual Routing Engines, network reconvergence takes place more quickly than on routing platforms with a single Routing Engine.

    When a Routing Engine is configured as master, it has full functionality. It receives and transmits routing information, builds and maintains routing tables, communicates with interfaces and Packet Forwarding Engine components, and has full control over the chassis. When a Routing Engine is configured to be the backup, it does not communicate with the Packet Forwarding Engine or chassis components.

    Note: On devices running Junos OS Release 8.4 or later, both Routing Engines cannot be configured to be master at the same time. This configuration causes the commit check to fail.

    A failover from the master Routing Engine to the backup Routing Engine occurs automatically when the master Routing Engine experiences a hardware failure or when you have configured the software to support a change in mastership based on specific conditions. You can also manually switch Routing Engine mastership by issuing one of the request chassis routing-engine commands. In this topic, the term failover refers to an automatic event, whereas switchover refers to either an automatic or a manual event.

    When a failover or a switchover occurs, the backup Routing Engine takes control of the system as the new master Routing Engine.

    • If graceful Routing Engine switchover is not configured, when the backup Routing Engine becomes master, it resets the switch plane and downloads its own version of the microkernel to the Packet Forwarding Engine components. Traffic is interrupted while the Packet Forwarding Engine is reinitialized. All kernel and forwarding processes are restarted.
    • If graceful Routing Engine switchover is configured, interface and kernel information is preserved. The switchover is faster because the Packet Forwarding Engines are not restarted. The new master Routing Engine restarts the routing protocol process (rpd). All hardware and interfaces are acquired by a process that is similar to a warm restart. For more information about graceful Routing Engine switchover, see Understanding Graceful Routing Engine Switchover.
    • If graceful Routing Engine switchover and nonstop active routing (NSR) are configured, traffic is not interrupted during the switchover. Interface, kernel, and routing protocol information is preserved. For more information about nonstop active routing, see Nonstop Active Routing Concepts.
    • If graceful Routing Engine switchover and graceful restart are configured, traffic is not interrupted during the switchover. Interface and kernel information is preserved. Graceful restart protocol extensions quickly collect and restore routing information from the neighboring routers. For more information about graceful restart, see Graceful Restart Concepts.

    Conditions That Trigger a Routing Engine Failover

    The following events can result in an automatic change in Routing Engine mastership, depending on your configuration:

    • The routing platform experiences a hardware failure. A change in Routing Engine mastership occurs if either the Routing Engine or the associated host module or subsystem is abruptly powered off. You can also configure the backup Routing Engine to take mastership if it detects a hard disk error on the master Routing Engine. To enable this feature, include the failover on-disk-failure statement at the [edit chassis redundancy] hierarchy level.
    • The routing platform experiences a software failure, such as a kernel crash or a CPU lock. You must configure the backup Routing Engine to take mastership when it detects a loss of keepalive signal. To enable this failover method, include the failover on-loss-of-keepalives statement at the [edit chassis redundancy] hierarchy level.
    • The routing platform experiences an em0 interface failure on the master Routing Engine. You must configure the backup Routing Engine to take mastership when it detects the em0 interface failure. To enable this failover method, include the on-re-to-fpc-stale statement at the [edit chassis redundancy failover] hierarchy level.
    • A specific software process fails. You can configure the backup Routing Engine to take mastership when one or more specified processes fail at least four times within 30 seconds. Include the failover other-routing-engine statement at the [edit system processes process-name] hierarchy level.

    If any of these conditions is met, a message is logged and the backup Routing Engine attempts to take mastership. By default, an alarm is generated when the backup Routing Engine becomes active. After the backup Routing Engine takes mastership, it continues to function as master even after the originally configured master Routing Engine has successfully resumed operation. You must manually restore it to its previous backup status. (However, if at any time one of the Routing Engines is not present, the other Routing Engine becomes master automatically, regardless of how redundancy is configured.)

    Default Routing Engine Redundancy Behavior

    By default, Junos OS uses re0 as the master Routing Engine and re1 as the backup Routing Engine. Unless otherwise specified in the configuration, re0 always becomes master when the acting master Routing Engine is rebooted.

    Note: A single Routing Engine in the chassis always becomes the master Routing Engine even if it was previously the backup Routing Engine.

    Perform the following steps to see how the default Routing Engine redundancy setting works:

    1. Ensure that re0 is the master Routing Engine.
    2. Manually switch the state of Routing Engine mastership by issuing the request chassis routing-engine master switch command from the master Routing Engine. re0 is now the backup Routing Engine and re1 is the master Routing Engine.

      Note: On the next reboot of the master Routing Engine, Junos OS returns the router to the default state because you have not configured the Routing Engines to maintain this state after a reboot.

    3. Reboot the master Routing Engine re1.

      The Routing Engine boots up and reads the configuration. Because you have not specified in the configuration which Routing Engine is the master, re1 uses the default configuration as the backup. Now both re0 and re1 are in a backup state. Junos OS detects this conflict and, to prevent a no-master state, reverts to the default configuration to direct re0 to become master.

    Routing Engine Redundancy on a TX Matrix Router

    In a routing matrix, all master Routing Engines in the TX Matrix router and connected T640 routers must run the same Junos OS release. Likewise, all backup Routing Engines in a routing matrix must run the same Junos OS release. When you run the same Junos OS release on all master and backup Routing Engines in a routing matrix, a change in mastership to any backup Routing Engine in the routing matrix does not cause a change in mastership in any other chassis in the routing matrix.

    Caution: (Routing matrix based on the TX Matrix or TX Matrix Plus routers only) Within the routing matrix, we recommend that all Routing Engines run the same Junos OS release. If you run different releases on the Routing Engines and a change in mastership occurs on any backup Routing Engine in the routing matrix based on TX Matrix router or TX Matrix Plus router, one or all routers might become logically disconnected from the TX Matrix router or the TX Matrix Plus router and cause data loss.

    If the same Junos OS release is not running on all master and backup Routing Engines in the routing matrix, the following consequences occur when the failover on-loss-of-keepalives statement is included at the [edit chassis redundancy] hierarchy level:

    • When the failover on-loss-of-keepalives statement is included at the [edit chassis redundancy] hierarchy level and you or a host subsystem initiates a change in mastership to the backup Routing Engine in the TX Matrix router, the master Routing Engines in the T640 routers detect a software release mismatch with the new master Routing Engine in the TX Matrix router and switch mastership to their backup Routing Engines.
    • When you manually change mastership to a backup Routing Engine in a T640 router using the request chassis routing-engine master command, the new master Routing Engine in the T640 router detects a software release mismatch with the master Routing Engine in the TX Matrix router and relinquishes mastership to the original master Routing Engine. (Routing Engine mastership in the TX Matrix router does not switch in this case.)
    • When a host subsystem initiates a change in mastership to a backup Routing Engine in a T640 router because the master Routing Engine has failed, the T640 router is logically disconnected from the TX Matrix router. To reconnect the T640 router, initiate a change in mastership to the backup Routing Engine in the TX Matrix router, or replace the failed Routing Engine in the T640 router and switch mastership to it. The replacement Routing Engine must be running the same software release as the master Routing Engine in the TX Matrix router.

    If the same Junos OS release is not running on all master and backup Routing Engines in the routing matrix, the following consequences occur when the failover on-loss-of-keepalives statement is not included at the [edit chassis redundancy] hierarchy level:

    • If you initiate a change in mastership to the backup Routing Engine in the TX Matrix router, all T640 routers are logically disconnected from the TX Matrix router. To reconnect the T640 routers, switch mastership of all master Routing Engines in the T640 routers to their backup Routing Engines.
    • If you initiate a change in mastership to a backup Routing Engine in a T640 router, the T640 router is logically disconnected from the TX Matrix router. To reconnect the T640 router, switch mastership of the new master Routing Engine in the T640 router back to the original master Routing Engine.

    Routing Engine Redundancy on a TX Matrix Plus Router

    In a routing matrix, all master Routing Engines in the TX Matrix Plus router and the connected LCC must run the same Junos OS release. Likewise, all backup Routing Engines in a routing matrix must run the same Junos OS release. When you run the same Junos OS release on all master and backup Routing Engines in the routing matrix, a change in mastership to any backup Routing Engine in the routing matrix does not cause a change in mastership in any other chassis in the routing matrix.

    Caution: (Routing matrix based on the TX Matrix or TX Matrix Plus routers only) Within the routing matrix, we recommend that all Routing Engines run the same Junos OS release. If you run different releases on the Routing Engines and a change in mastership occurs on any backup Routing Engine in the routing matrix based on a TX Matrix router or a TX Matrix Plus router, one or all routers might become logically disconnected from the TX Matrix router or the TX Matrix Plus router and cause data loss.

    If the same Junos OS release is not running on all master and backup Routing Engines in the routing matrix, the following scenarios occur when the failover on-loss-of-keepalives statement is included at the [edit chassis redundancy] hierarchy level:

    • When the failover on-loss-of-keepalives statement is included at the [edit chassis redundancy] hierarchy level and you or a host subsystem initiates a change in mastership to the backup Routing Engine in the TX Matrix Plus router, the master Routing Engines in the connected LCC detect a software release mismatch with the new master Routing Engine in the TX Matrix Plus router and switch mastership to their backup Routing Engines.
    • When you manually change mastership to a backup Routing Engine in a connected LCC by using the request chassis routing-engine master command, the new master Routing Engine in the connected LCC detects a software release mismatch with the master Routing Engine in the TX Matrix Plus router and relinquishes mastership to the original master Routing Engine. (Routing Engine mastership in the TX Matrix Plus router does not switch in this case.)
    • When a host subsystem initiates a change in mastership to a backup Routing Engine in a connected LCC because the master Routing Engine has failed, the connected LCC is logically disconnected from the TX Matrix Plus router. To reconnect the connected LCC, initiate a change in mastership to the backup Routing Engine in the TX Matrix Plus router, or replace the failed Routing Engine in the connected LCC and switch mastership to it. The replacement Routing Engine must be running the same software release as the master Routing Engine in the TX Matrix Plus router.

    If the same Junos OS release is not running on all master and backup Routing Engines in the routing matrix, the following scenarios occur when the failover on-loss-of-keepalives statement is not included at the [edit chassis redundancy] hierarchy level:

    • If you initiate a change in mastership to the backup Routing Engine in the TX Matrix Plus router, all connected LCCs are logically disconnected from the TX Matrix Plus router. To reconnect the connected LCC, switch mastership of all master Routing Engines in the connected LCC to their backup Routing Engines.
    • If you initiate a change in mastership to a backup Routing Engine in a connected LCC, the connected LCC is logically disconnected from the TX Matrix Plus router. To reconnect the connected LCC, switch mastership of the new master Routing Engine in the connected LCC back to the original master Routing Engine.

    Situations That Require You to Halt Routing Engines

    Before you shut the power off to a routing platform that has two Routing Engines or before you remove the master Routing Engine, you must first halt the backup Routing Engine and then halt the master Routing Engine. Otherwise, you might need to reinstall Junos OS. You can use the request system halt both-routing-engines command on the master Routing Engine, which first shuts down the master Routing Engine and then shuts down the backup Routing Engine. To shut down only the backup Routing Engine, issue the request system halt command on the backup Routing Engine.

    If you halt the master Routing Engine and do not power it off or remove it, the backup Routing Engine remains inactive unless you have configured it to become the master when it detects a loss of keepalive signal from the master Routing Engine.

    Note: To restart the router, you must log in to the console port (rather than the Ethernet management port) of the Routing Engine. When you log in to the console port of the master Routing Engine, the system automatically reboots. After you log in to the console port of the backup Routing Engine, press Enter to reboot it.

    Note: If you have upgraded the backup Routing Engine, first reboot it and then reboot the master Routing Engine.

    Modified: 2016-06-13