Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Configure Graceful Restart and Long-Lived Graceful Restart

SUMMARY This topic describes Graceful Restart and Long-Lived Graceful Restart (LLGR) Juniper® Cloud-Native Contrail Networking (CN2) Release 23.2 and later in a Kubernetes-orchestrated environment.

Overview

In CN2, whenever a peer session is detected as down, the control node deletes all routes learned from the peer and immediately withdraws the routes from its advertised peers. This event causes instantaneous disruption of traffic among the worker nodes. To prevent this, you can configure graceful restart and long-lived graceful restart (LLGR). Enabling these features ensures that routes learnt are not immediately deleted and withdraw from the advertised peers. Instead, the routes are kept and marked as stale. Consequently, if sessions come back up and routes are relearned, the overall impact to the network is minimized.

Graceful restart allows a routing device undergoing a restart to inform its adjacent neighbors and peers of its condition. LLGR is a mechanism used to preserve routing details for a longer period of time in the event of a failed peer. LLGR retains stale routes for a much longer time than that allowed by graceful restart alone. With LLGR, route preference is retained and best paths are recomputed.

Note:
  • Graceful restart and LLGR support IPv4, IPv6, and EVPN Type 2 routes.

  • Neither graceful restart nor LLGR supports multicast traffic.

BGP Helper Mode

You can use BGP helper mode to minimize routing churn whenever a BGP session flaps. Or when the BGP peers restart, BGP helper mode can be used to minimize the impact to the network. This is especially helpful if the SDN gateway router goes down gracefully, as in an rpd crash or restart on a device. In that case, CN2 acts as a helper to the gateway, by retaining the routes learned from the gateway and advertises them to the rest of the network, as applicable. For this to work, the restarting router (the SDN gateway in this case) must support and be configured with graceful restart for all of the address families used.

BGP helper mode is also supported for BGP-as-a-Service (BGPaaS) clients. When configured, CN2 provides BGP helper mode to a restarting BGPaaS client.

XMPP Helper Mode

Contrail vRouter datapath agent supports route retention with its controller peer when LLGR with XMPP helper mode is enabled. This route retention allows the datapath agent to retain the last route path from the Contrail controller when an XMPP-based connection is lost. The route paths are held by the agent until a new XMPP-based connection is established to one of the Contrail controllers. Once the XMPP connection is up and is stable for a predefined duration, the route paths from the old XMPP connection are flushed. This route retention allows a controller to go down gracefully but with some forwarding interruption when connectivity to a controller is restored.

When enabling graceful restart and LLGR with XMPP helper mode, consider the following:

  • You can enable graceful restart and LLGR with XMPP helper mode without enabling BGP helper mode and vica-versa.

  • LLGR and XMPP subsecond timers for fast convergences should not be used simultaneously.

Configure Graceful Restart and LLGR

By default, graceful restart and LLGR are not enabled. To enable these features, edit the global-system-config spec using the following command:
Note:

Whenever you enable or disable gracefulRestartParameters, the GlobalSystemConfigSpec gets updated and pushed down to all control nodes. Subsequently the configuration gets pushed down to the worker nodes as well. This results in session (BGP and XMPP) flap (session restart) since the peers are exchanging the new configuration. Because of this, you might observe a small drop in traffic while the peers restart.

The following is an example of how to enable graceful restart and LLGR and assign timer values.

In the above example, enable is set to true to enable graceful restart and LLGR. The bgpHelperEnable and xmppHelperEnable modes are set to true to enable helper modes for BGP and XMPP peers.

See Table 1 for descriptions of the timers and their associated behaviors.

Table 1: Graceful Restart Timers
Timer Description
Restart time

The restartTime indicates how long the pod waits for a graceful restart capable neighbor to re-establish BGP peering. The default is 60 seconds.

We recommend that you specify a nonzero value. A nonzero reset time advertise for graceful restart and long-lived graceful restart capabilities from peers.

  • bgpHelperEnable— Routes advertised by the BGP peer are kept for the duration of the restart time.

  • xmppHelperEnable—Routes advertised by XMPP peer are kept for the duration of the restart time.

LLGR restart time The longLivedRestartTime indicates the amount of time LLGR retains stale routes.The default is 1800 seconds.
  • bgpHelperEnable— Routes advertised by BGP peers are kept for the duration of the LLGR timer.

  • xmppHelperEnable—Routes advertised by XMPP peers are kept for the duration of the LLGR timer.

When graceful restart and LLGR are both configured, the duration of the LLGR timer is the sum of both timers.

End of RIB timer

The endOfRibTimeout timer specifies the amount of time a control node waits to remove stale routes from a vRouter agent’s RIB. The default is 90 seconds.

The EOR timer starts when this End of Config message is received by the vRouter agent. When the EOR timer expires, an EOR message is sent from the vRouter agent to the control node. The control node receives this EOR message then removes the stale routes which were previously advertised by the vRouter agent from it’s RIB.

Verify Your Configuration

To verify your configuration, run the following command:

The following output is an example of the GlobalSysemConfig.yaml file with graceful restart and LLGR configured: