Overview
Three main types of graceful restart available on Juniper Networks platforms are:
- Graceful restart for routing protocols and summaries—Provides protection for aggregate routes, Border Gateway Protocol (BGP), Intermediate System to Intermediate System (IS-IS), Open Shortest Path First (OSPF), Routing Information Protocol (RIP), next-generation RIP (RIPng), and static routes.
- Graceful restart for MPLS-related protocols—Provides protection for Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP), circuit cross-connect (CCC), and translational cross-connect (TCC).
- Graceful restart for Virtual Private Networks (VPNs)—Provides protection for Layer 2 and Layer 3 VPNs.
Graceful restart works similarly for routing protocols and MPLS protocols and combines components of these protocol types to enable graceful restart in VPNs. The main benefits of graceful restart are uninterrupted packet forwarding and temporary suppression of all routing protocol updates. Graceful restart thus allows a router to pass through intermediate convergence states that are hidden from the rest of the network.
Most graceful restart implementations define two types of routers—the restarting router and the helper router. The restarting router requires rapid restoration of forwarding state information so it can resume the forwarding of network traffic. The helper router assists the restarting router in this process. Graceful restart configuration statements typically affect either the restarting router or the helper router. A brief description of graceful restart for each supported protocol follows:
- BGP—When a router enabled for BGP graceful restart restarts, it retains BGP peer routes in its forwarding table and marks them as stale. However, it continues to forward traffic to other peers (or receiving peers) during the restart. To re-establish sessions, the restarting router sets the "restart state" bit in the BGP OPEN message and sends it to all participating peers. The receiving peers reply to the restarting router with messages containing end-of-routing-table markers. When the restarting router receives all replies from the receiving peers, the restarting router performs route selection, the forwarding table is updated, and the routes previously marked as stale are discarded. At this point, all BGP sessions are re-established and the restarting peer can receive and process BGP messages as usual.
While the restarting router does its processing, the receiving peers also temporarily retain routing information. Once a receiving peer detects a TCP transport reset, it retains the routes received and marks the routes as stale. After the session is re-established with the restarting router, the stale routes are replaced with updated route information.
- IS-IS—Normally, IS-IS routers move neighbor adjacencies to the down state when changes occur. However, a router enabled for IS-IS graceful restart sends out Hello messages with the Restart Request (RR) bit set in a restart type length value (TLV) message. This indicates to neighboring routers that a graceful restart is in progress and that the IS-IS adjacency should be left intact. For this to work, the neighboring routers must understand and implement restart signaling themselves. Besides maintaining the adjacency, the neighbors send complete sequence number PDUs (CSNPs) to the restarting router and flood their entire database.
The restarting router never floods any of its own link-state PDUs (LSPs), including pseudonode LSPs, to IS-IS neighbors while undergoing graceful restart. This allows neighbors to re-establish their adjacencies without transitioning to the down state and allows the restarting router to re-initiate a smooth database synchronization.
- OSPF—When a router enabled for OSPF graceful restart restarts, it retains routes learned prior to the restart in its forwarding table. The router does not allow new OSPF link-state advertisements (LSAs) to update the routing table. This router continues to forward traffic to other OSPF neighbors (or helper routers), and sends only a limited number of LSAs during the restart period. To re-establish OSPF adjacencies with neighbors, the restarting router must send a grace LSA to all neighbors. In response, the helper routers enter helper mode and send an acknowledgement back to the restarting router. Also, if there are no topology changes, the helper routers continue to advertise LSAs as if the restarting router had remained in continuous OSPF operation.
When the restarting router receives all replies from the helper routers, the restarting router performs route selection, the forwarding table is updated, and the routes previously retained are discarded. At this point, full OSPF adjacencies are re-established and the restarting router can receive and process OSPF LSAs as usual. When the helper routers no longer receive grace LSAs from the restarting router or the topology of the network changes, the helper routers also resume normal operation.
- RSVP—This protocol uses a field called the restart capabilities object. This object is sent in RSVP Hello messages to peers and is used to advertise a router's RSVP restart capabilities. When an RSVP-enabled router restarts, a Hello message is sent to neighbors (or helper routers) to indicate a restart is in progress. Helper routers reply to the restarting router with an RSVP PATH message that contains a recovery label. The recovery label contains the information from a previous label that was advertised by the restarting node before it restarted. After receiving the recovery label, the restarting router can restore its previous forwarding state and resume operation.
- CCC and TCC—These two protocols rely on RSVP for all graceful restart functionality. As a result, there are no configuration statements unique to CCC and TCC. However, you can use the
show connectionscommand to verify that CCC and TCC graceful restart is operating. CCC and TCC graceful restart is supported on label-switched path (LSP) switch and remote interface switch connections.- LDP—This protocol uses the Fault Tolerant (FT) Session TLV as an optional parameter in the LDP Initialization message. Routers exchange this TLV during session initialization to advertise their capability to perform graceful restart or act as a helper router. When an LDP router restarts, it sends an initialization message to neighbors. The message advertises the length of time the helper routers are requested to assist the restarting router (recovery time). During this time, both routers maintain MPLS forwarding states. The restarting router marks all routes as stale and discards the routes when full neighborship is re-established and the restart is complete.
The neighbor (or helper router) of the restarting router marks all label bindings it received from the restarting router as stale and waits for them to be refreshed or to expire at the end of the recovery time. On the helper router, a local timer governs the maximum amount of time the helper router is willing to maintain forwarding states. LDP graceful restart can be configured in a master instance or in a routing instance and supports a carrier-of-carriers scenario.
- BGP graceful restart functionality is used on all provider edge (PE) to PE BGP sessions. This affects sessions carrying any service signaling data for network layer reachability information (NLRI), for example, an IPv4 VPN or Layer 2 VPN NLRI.
- OSPF, ISIS, LDP, or RSVP graceful restart functionality is used in all core routers. Routes added by these protocols are used to resolve Layer 2 and Layer 3 VPN NLRI.
- Protocol restart functionality is used for any Layer 3 protocol (RIP, OSPF, LDP, and so on) used between the PE and customer edge (CE) routers. This does not apply to Layer 2 VPNs because Layer 2 protocols used between the CE and PE routers do not have graceful restart capabilities.
Before VPN graceful restart can work properly, all the above components should restart gracefully. In other words, the routers should preserve their forwarding states and request neighbors to continue forwarding to the router in case of a restart. If all the above conditions are satisfied, VPN graceful restart imposes the following rules on a restarting router:
- The router must wait to receive all BGP NLRI information from other PE routers before advertising routes to the CE routers.
- The router must wait for all protocols in all routing instances to converge (or complete the restart process) before it sends CE router information to other PE routers. In other words, the router must wait for all instance information (whether derived from local configuration or advertisements received from a remote peer) to be processed before it sends this information to other PE routers.
- The router must preserve all forwarding state in the
instance.mpls.0tables until the new labels and transit routes are allocated and announced to other PE routers (and CE routers in a carrier-of-carriers scenario).If any condition is not met, VPN graceful restart will not succeed in providing uninterrupted forwarding between CE routers across the VPN infrastructure.