Understanding Nonstop Active Routing
Nonstop active routing (NSR) enables the transparent switchover of the Routing Engines in the event that one of the Routing Engines goes down.
Nonstop Active Routing Concepts
Nonstop active routing (NSR) uses the same infrastructure as graceful Routing Engine switchover (GRES) to preserve interface and kernel information. However, NSR also saves routing protocol information by running the routing protocol process (rpd) on the backup Routing Engine. By saving this additional information, NSR is self-contained and does not rely on helper routers (or switches) to assist the routing platform in restoring routing protocol information. NSR is advantageous in networks in which neighbor routers (or switches) do not support graceful restart protocol extensions. As a result of this enhanced functionality, NSR is a natural replacement for graceful restart.
Starting with Junos OS Release 15.1R1, if you have NSR configured, it is never valid to
issue the restart routing command in any form on the NSR primary
Routing Engine. Doing so results in a loss of protocol adjacencies and neighbors and a
drop in traffic.
Use Feature Explorer to confirm platform and release support for specific features.
Review the Platform-Specific NSR Behavior section for notes related to your platform.
To use NSR, you must first enable GRES on your routing (or switching) platform. For more information about GRES, see Understanding Graceful Routing Engine Switchover.
If NSR is enabled, certain system log (syslog) messages are sent from the backup Routing Engine if the configured syslog host is reachable through the fxp0 interface.
Figure 1 shows the system architecture of nonstop active routing and the process a routing (or switching) platform follows to prepare for a switchover.
The switchover preparation process for NSR comprises the following steps:
-
The primary Routing Engine starts.
-
The routing (or switching) platform processes on the primary Routing Engine (such as the chassis process [chassisd] and the routing protocol process [rpd]) start.
-
The Packet Forwarding Engine starts and connects to the primary Routing Engine.
-
All state information is updated in the system.
-
The backup Routing Engine starts, including the chassis process (chassisd) and the routing protocol process (rpd).
-
The system determines whether GRES and NSR have been enabled.
-
The kernel synchronization process (ksyncd) synchronizes the backup Routing Engine with the primary Routing Engine.
-
For supported protocols, state information is updated directly between the routing protocol processes on the primary and backup Routing Engines.
Figure 2 shows the effects of a switchover on the routing platform.
The switchover process comprises the following steps:
-
When keepalives from the primary Routing Engine are lost, the system switches over gracefully to the backup Routing Engine.
-
The Packet Forwarding Engine connects to the backup Routing Engine, which becomes the new primary. Because the routing protocol process (rpd) and chassis process (chassisd) are already running, these processes do not need to restart.
-
State information learned from the point of the switchover is updated in the system. Forwarding and routing are continued during the switchover, resulting in minimal packet loss.
-
Peer routers (or switches) continue to interact with the routing platform as if no change had occurred. Routing adjacencies and session state relying on underlying routing information are preserved and not reset.
We recommend that you do not restart the routing protocol process (rpd) on primary Routing Engine after enabling NSR, as it disrupts the protocol adjacency/peering sessions, resulting in traffic loss.
See Also
Understanding Nonstop Active Routing on EX Series Switches
You can configure nonstop active routing (NSR) on an EX Series switch with redundant Routing Engines or on an EX Series Virtual Chassis to enable the transparent switchover of the Routing Engines in the event that one of the Routing Engines goes down.
Nonstop active routing provides high availability for Routing Engines by enabling transparent switchover of the Routing Engines without requiring restart of supported routing protocols. Both Routing Engines are fully active in processing protocol sessions, and so each can take over for the other. The switchover is transparent to neighbor routing devices, which do not detect that a change has occurred.
Enable nonstop active routing when neighbor routing devices are not configured to support graceful restart of protocols or when you want to ensure graceful restart of protocols for which graceful restart is not supported—such as PIM.
You do not need to start the two Routing Engines simultaneously
to synchronize them for nonstop active routing. If both Routing Engines
are not present or not up when you issue a commit synchronize statement, the candidate configuration is committed in the primary
Routing Engine and when the backup Routing Engine is inserted or comes
online, its configuration is automatically synchronized with that
of the primary.
Nonstop active routing uses the same infrastructure as graceful Routing Engine switchover (GRES) to preserve interface and kernel information. However, nonstop active routing also saves routing protocol information by running the routing protocol process (rpd) on the backup Routing Engine. By saving this additional information, nonstop active routing does not rely on other routing devices to assist in restoring routing protocol information.
After a graceful Routing Engine switchover, we recommend
that you issue the clear interface statistics (interface-name | all) command to reset the cumulative values for local statistics
on the new primary Routing Engine.
If you suspect a problem with the synchronization of Routing Engines when nonstop active routing is enabled, you can gather troubleshooting information using trace options. For example, if certain protocols lose connectivity with neighbors after a graceful Routing Engine switchover with NSR enabled, you can use trace options to help isolate the problem. See Tracing Nonstop Active Routing Synchronization Events.
Graceful restart and nonstop active routing are mutually exclusive. You will receive an error message upon commit if both are configured.
Nonstop active routing provides a transparent switchover mechanism only for Layer 3 protocol sessions. Nonstop bridging (NSB) provides a similar mechanism for Layer 2 protocol sessions. See Understanding Nonstop Bridging on EX Series Switches.
See Also
Nonstop Active Routing System Requirements
This section contains the following topics:
- Nonstop Active Routing Protocol and Feature Support
- Nonstop Active Routing BFD Support
- Nonstop Active Routing BGP Support
- Nonstop Active Routing Layer 2 Circuit and VPLS Support
- Nonstop Active Routing PIM Support
- Nonstop Active Routing MSDP Support
- Nonstop Active Routing Support for RSVP-TE LSPs
Nonstop Active Routing Protocol and Feature Support
The following protocols are supported by nontop active routing:
-
Aggregated Ethernet interfaces with Link Aggregation Control Protocol (LACP)
-
Bidirectional Forwarding Detection (BFD)
For more information, see Nonstop Active Routing BFD Support.
-
BGP
For more information, see Nonstop Active Routing BGP Support.
-
EVPN
-
EVPN with ingress replication for BUM traffic
-
EVPN-ETREE
-
EVPN-VPWS
-
EVPN -VXLAN
-
PBB-EVPN
-
EVPN with P2MP mLDP replication for BUM traffic starting in Junos OS release 18.2R1
For more information, please see NSR and Unified ISSU Support for EVPN .
-
-
Labeled BGP (PTX Series Packet Transport Routers: only)
-
IS-IS
-
LDP
-
LDP-based virtual private LAN service (VPLS)
-
LDP OAM (operation, administration, and management) features
-
LDP (PTX Series Packet Transport Routers only)
Nonstop active routing support for LDP includes:
-
LDP unicast transit LSPs
-
LDP egress LSPs for labeled internal BGP (IBGP) and external BGP (EBGP)
-
LDP over RSVP transit LSPs
-
LDP transit LSPs with indexed next hops
-
LDP transit LSPs with unequal cost load balancing
-
LDP Point-to-Multipoint LSPs
-
LDP ingress LSPs
-
-
Layer 2 circuits
-
Layer 2 VPNs
-
Layer 2 VPNs (PTX Series Packet Transport Routers only)
Note:Nonstop active routing is not supported for Layer 2 interworking (Layer 2 stitching).
-
Layer 3 VPNs (does not include dynamic GRE tunnels, multicast VPNs, or BGP flow routes.)
Nonstop active routing support for Layer 3 VPNs include:
-
IPv4 labeled-unicast (ingress or egress)
-
IPv4-vpn unicast (ingress or egress)
-
IPv6 labeled-unicast (ingress or egress)
-
IPv6-vpn unicast (ingress or egress)
-
-
Logical System support (Nonstop active routing support for logical systems to preserve interface and kernel information).
-
Multicast Source Discovery Protocol (MSDP)
For more information, see Nonstop Active Routing MSDP Support.
-
OSPF/OSPFv3
Note:OSPFv3 neighbors enabled with IPSEC authentication are not supported with NSR.
-
Protocol Independent Multicast (PIM)
For more information, see Nonstop Active Routing PIM Support.
-
RIP and RIP next generation (RIPng)
-
RSVP (PTX Series Packet Transport Routers only)
Nonstop active routing support for RSVP includes:
-
Point-to-Multipoint LSPs
-
RSVP Point-to-Multipoint ingress, transit, and egress LSPs using existing non-chained next hop.
-
RSVP Point-to-Multipoint transit LSPs using composite next hops for Point-to-Multipoint label routes.
-
-
Point-to-Point LSPs
-
RSVP Point-to-Point ingress, transit, and egress LSPs using non-chained next hops.
-
RSVP Point-to-Point transit LSPs using chained composite next hops.
-
-
-
RSVP-TE LSP
For more information, see Nonstop Active Routing Support for RSVP-TE LSPs.
-
VPLS
-
VRRP
-
VRRP
If you configure a protocol that is not supported by nonstop active routing, the protocol operates as usual. When a switchover occurs, the state information for the unsupported protocol is not preserved and must be refreshed using the normal recovery mechanisms inherent in the protocol.
On routers that have logical systems configured on them, NSR is only supported in the main instance.
In a Virtual Chassis environment configured with OSPF and NSR, any failure or restart of the backup device can lead to longer global convergence times compared to environments where NSR is not configured.
Nonstop Active Routing BFD Support
Nonstop active routing supports the Bidirectional Forwarding Detection (BFD) protocol, which uses the topology discovered by routing protocols to monitor neighbors. The BFD protocol is a simple hello mechanism that detects failures in a network. Because BFD is streamlined to be efficient at fast liveness detection, when it is used in conjunction with routing protocols, routing recovery times are improved. With nonstop active routing enabled, BFD session states are not restarted when a Routing Engine switchover occurs.
BFD session states are saved only for clients using aggregate or static routes or for BGP, IS-IS, OSPF/OSPFv3, PIM, or RSVP.
When a BFD session is distributed to the Packet Forwarding Engine, BFD packets continue to be sent during a Routing Engine switchover. If nondistributed BFD sessions are to be kept alive during a switchover, you must ensure that the session failure detection time is greater than the Routing Engine switchover time. The following BFD sessions are not distributed to the Packet Forwarding Engine: multihop sessions, tunnel-encapsulated sessions, and sessions over integrated routing and bridging (IRB) interfaces.
BFD is an intensive protocol that consumes system resources. Specifying a minimum interval
for BFD less than 100 ms for Routing Engine-based sessions and 10 ms for distributed BFD
sessions can cause undesired BFD flapping. The minimum-interval configuration
statement is a BFD liveness detection parameter.
Depending on your network environment, these additional recommendations might apply:
-
For large-scale network deployments with a large number of BFD sessions, specify a minimum interval of 300 ms for Routing Engine-based sessions, and 100 ms for distributed BFD sessions.
-
For very large-scale network deployments with a large number of BFD sessions, contact Juniper Networks customer support for more information.
-
For BFD sessions to remain up during a Routing Engine switchover event when nonstop active routing is configured, specify a minimum interval of 2.5 seconds for Routing Engine-based sessions. For distributed BFD sessions with nonstop active routing configured, the minimum interval recommendations are unchanged and depend only on your network deployment.
Nonstop Active Routing BGP Support
Nonstop active routing BGP support is subject to the following conditions:
-
You must include the
path-selection external-router-IDstatement at the[edit protocols bgp]hierarchy level to ensure consistent path selection between the primary and backup Routing Engines during and after the nonstop active routing switchover. -
You must include the
advertise-from-main-vpn-tablesstatement at the[edit protocols bgp]hierarchy level to prevent BGP sessions from going down when route reflector (RR) or autonomous system border router (ASBR) functionality is enabled or disabled on a routing device that has VPN address families configured. -
BGP session uptime and downtime statistics are not synchronized between the primary and backup Routing Engines during Nonstop Active Routing and ISSU. The backup Routing Engine maintains its own session uptime based on the time when the backup first becomes aware of the established sessions. For example, if the backup Routing Engine is rebooted (or if you run
restart routingon the backup Routing Engine), the backup's uptime is a short duration, because the backup has just learned about the established sessions. If the backup is operating when the BGP sessions first come up on the primary, the uptime on the primary and the uptime on the backup are almost the same duration. After a Routing Engine switchover, the new primary continues from the time left on the backup Routing Engine. -
If the BGP peer in the primary Routing Engine has negotiated address-family capabilities that are not supported for nonstop active routing, then the corresponding BGP neighbor state on the backup Routing Engine shows as idle. On switchover, the BGP session is reestablished from the new primary Routing Engine.
Only the following address families are supported for nonstop active routing:
-
evpn-signaling
-
inet labeled-unicast
-
inet-mdt
-
inet multicast
-
inet-mvpn
-
inet unicast
-
inet-vpn unicast
-
inet6 labeled-unicast
-
inet6 multicast
-
inet6-mvpn
-
inet6 unicast
-
inet6-vpn unicast
-
iso-vpn
-
l2vpn signaling
-
route-target
Note:Address families are supported only on the main instance of BGP. Only unicast is supported on VRF instances.
-
-
BGP route dampening does not work on the backup Routing Engine when nonstop active routing is enabled.
Nonstop Active Routing Layer 2 Circuit and VPLS Support
Nonstop active routing supports Layer 2 circuit and VPLS on both LDP-based and RSVP-TE-based networks. Nonstop active routing support enables the backup Routing Engine to track the label advertised by Layer 2 circuit and VPLS on the primary Routing Engine, and to use the same label after the Routing Engine switchover.
Nonstop active routing supports Layer 2 circuit and LDP-based VPLS pseudowire redundant configurations.
Nonstop Active Routing PIM Support
Nonstop active routing supports Protocol Independent Multicast (PIM) with stateful replication on backup Routing Engines. State information replicated on the backup Routing Engine includes information about neighbor relationships, join and prune events, rendezvous point (RP) sets, synchronization between routes and next hops, multicast session states, and the forwarding state between the two Routing Engines.
Nonstop active routing for PIM is supported for IPv4 and IPv6. Junos OS also supports nonstop active routing for PIM on devices that have both IPv4 and IPv6 configured on them.
To configure nonstop active routing for PIM, include the same statements in the configuration
as for other protocols: the nonstop-routing statement at the [edit
routing-options] hierarchy level and the graceful-switchover
statement at the [edit chassis redundancy] hierarchy level. To trace PIM
nonstop active routing events, include the flag nsr-synchronization statement
at the [edit protocols pim traceoptions] hierarchy level.
The clear pim join, clear pim register, and clear
pim statistics operational mode commands are not supported on the backup Routing
Engine when nonstop active routing is enabled.
Nonstop active routing support varies for different PIM features. The features fall into the following three categories: supported features, unsupported features, and incompatible features.
Supported features:
-
Auto-RP
Note:Nonstop active routing PIM support on IPv6 does not support auto-RP because IPv6 does not support auto-RP.
-
Bootstrap router (BSR)
-
Static RPs
-
Embedded RP on non-RP IPv6 routers
-
Local RP
Note:RP set information synchronization is supported for local RP and BSR (on IPv4 and IPv6), autoRP (on IPv4), and embedded RP (on IPv6).
-
BFD
-
Dense mode
-
Sparse mode
-
Source-specific multicast (SSM)
-
Draft Rosen multicast VPNs (MVPNs)
-
Anycast RP (anycast RP set information synchronization and anycast RP register state synchronization on IPv4 and IPv6 configurations)
-
Flow maps
-
Unified ISSU
-
Policy features such as neighbor policy, bootstrap router export and import policies, scope policy, flow maps, and reverse path forwarding (RPF) check policies
-
Upstream assert synchronization
-
PIM join load balancing
Junos OS supports nonstop active routing PIM for draft Rosen MVPNs. Nonstop active routing PIM support for draft Rosen MVPNs enables nonstop active routing-enabled devices to preserve draft Rosen MPVN-related information—such as default and data multicast distribution tree (MDT) states—across switchovers.
The backup Routing Engine sets up the default MDT based on the configuration and the information it receives from the primary Routing Engine, and keeps updating the default MDT state information.
However, for data MDTs, the backup Routing Engine relies on the primary Routing Engine to provide updates when data MDTs are created, updated, or deleted. The backup Routing Engine neither monitors data MDT flow rates nor triggers a data MDT switchover based on variations in flow rates. Similarly, the backup Routing Engine does not maintain the data MDT delay timer or timeout timer. It does not send MDT join TLV packets for the data MDTs until it takes over as the primary Routing Engine. After the switchover, the new primary Routing Engine starts sending MDT join TLV packets for each data MDT, and also resets the data MDT timers. Note that the expiration time for the timers might vary from the original values on the previous primary Routing Engine.
Junos OS supports Protocol Independent Multicast (PIM) nonstop active routing on IGMP-only interfaces. Multicast joins on IGMP-only interfaces are mapped to PIM states, and these states are replicated on the backup Routing Engine. If the corresponding PIM states are available on the backup, the multicast routes are marked as forwarding on the backup Routing Engine. This enables uninterrupted traffic flow after a switchover. This support covers IGMPv2, IGMPv3, MLDv1, and MLDv2 reports and leaves.
Unsupported features: You can configure the following PIM features on a router along with nonstop active routing, but they function as if nonstop active routing is not enabled. In other words, during Routing Engine switchover and other outages, their state information is not preserved, and traffic loss is to be expected.
-
Internet Group Management Protocol (IGMP) exclude mode
-
IGMP snooping
Nonstop active routing is not supported for next-generation MVPNs with PIM provider tunnels. The commit operation fails if the configuration includes both nonstop active routing and next-generation MVPNs with PIM provider tunnels.
Junos OS provides a configuration statement that disables nonstop active routing for PIM only,
so that you can activate incompatible PIM features and continue to use nonstop active routing
for the other protocols on the router. Before activating an incompatible PIM feature, include
the nonstop-routing disable statement at the [edit protocols
pim] hierarchy level. Note that in this case, nonstop active routing is disabled for
all PIM features, not just incompatible features.
Nonstop Active Routing MSDP Support
Junos OS supports nonstop active routing for the Multicast Source Discovery Protocol (MSDP).
Nonstop active routing support for MSDP preserves the following MSDP-related information across the switchover:
-
MSDP configuration and peer information
-
MSDP peer socket information
-
Source-active and related information
However, note that the following restrictions or limitations apply to nonstop active routing MSDP support:
-
Because the backup Routing Engine learns the active source information by processing the source-active messages from the network, synchronizing of source active information between the primary and backup Routing Engines might take up to 60 seconds. So, no planned switchover is allowed within 60 seconds of the initial replication of the sockets.
-
Similarly, Junos OS does not support two planned switchovers within 240 seconds of each other.
Junos OS enables you to trace MSDP nonstop active routing events by including the flag
nsr-synchronization statement at the [edit protocols msdp
traceoptions] hierarchy level.
Nonstop Active Routing Support for RSVP-TE LSPs
Junos OS supports nonstop active routing for label-switching routers (LSRs) and Layer 2 Circuits that are part of an RSVP-TE LSP. Nonstop active routing support on LSRs ensures that the primary to backup Routing Engine switchover on an LSR remains transparent to the network neighbors and that the LSP information remains unaltered during and after the switchover.
You can use the show rsvp version command to view the nonstop active routing
mode and state on an LSR. Similarly, you can use the show mpls lsp and
show rsvp session commands on the backup Routing Engine to view the state
recreated on the backup Routing Engine.
The Junos OS nonstop active routing feature is also supported on RSVP point-to-multipoint LSPs. During the switchover, the LSP comes up on the backup Routing Engine that shares and synchronizes the state information with the primary Routing Engine before and after the switchover. Nonstop active routing support for point-to-multipoint transit and egress LSPs ensures that the switchover remains transparent to the network neighbors, and preserves the LSP information across the switchover.
Junos OS supports nonstop active routing for next-generation multicast VPNs (MVPNs).
The show rsvp session detail command enables you to check the
point-to-multipoint LSP remerge state information (P2MP LSP re-merge; possible
values are head, member, and none).
Junos OS supports nonstop active routing for point-to-multipoint LSPs used by VPLS and MVPN.
However, Junos OS does not support nonstop active routing for the following features:
-
Generalized Multiprotocol Label Switching (GMPLS) and LSP hierarchy
-
Interdomain or loose-hop expansion LSPs
-
BFD liveness detection
-
Setup protection
Nonstop active routing support for RSVP-TE LSPs is subject to the following limitations and restrictions:
-
Detour LSPs are not maintained across a switchover and so, detour LSPs might fail to come back online after the switchover.
-
Control plane statistics corresponding to the
show rsvp statisticsandshow rsvp interface detail | extensivecommands are not maintained across Routing Engine switchovers. -
Statistics from the backup Routing Engine are not reported for
show mpls lsp statisticsandmonitor mpls label-switched-pathcommands. However, if a switchover occurs, the backup Routing Engine, after taking over as the primary, starts reporting statistics. Note that theclear statisticscommand issued on the old primary Routing Engine does not have any effect on the new primary Routing Engine, which reports statistics, including any uncleared statistics. -
State timeouts might take additional time during nonstop active routing switchover. For example, if a switchover occurs after a neighbor has missed sending two hello messages to the primary, the new primary Routing Engine waits for another three hello periods before timing out the neighbor.
-
On the RSVP ingress router, if you configure auto-bandwidth functionality, the bandwidth adjustment timers are set in the new primary after the switchover. This causes a one-time increase in the length of time required for the bandwidth adjustment after the switchover occurs.
-
Backup LSPs —LSPs that are established between the point of local repair (PLR) and the merge point after a node or link failure—are not preserved during a Routing Engine switchover.
-
When nonstop active routing is enabled, graceful restart is not supported. However, graceful restart helper mode is supported.
See Also
Platform-Specific NSR Behavior
Use the following table to review platform-specific behaviors for your platforms.
|
Platform |
Difference |
|---|---|
|
EX Series |
On EX9214 switches, the VRRP primary state might change during graceful Routing Engine switchover, even when nonstop active routing is enabled. |
|
MX Series |
NSR is not supported during the Routing Engine reboot process on MX Series devices with the Next-Generation Routing Engine (NG-RE) installed. NSR will still work during the Routing Engine switchover process. |
|
PTX Series |
Nonstop active routing (NSR) switchover on PTX Series is supported only for the following MPLS and VPN protocols and applications using chained composite next hops:
|
Change History Table
Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.
restart routing command in any form on the NSR
primary Routing Engine.