ON THIS PAGE
Configuring Graceful Restart and Long-lived Graceful Restart
Graceful restart and long-lived graceful restart BGP helper modes are supported for the Contrail control node and XMPP helper mode.
Application of Graceful Restart and Long-lived Graceful Restart
Whenever a BGP peer session is detected as down, all routes learned from the peer are deleted and immediately withdrawn from advertised peers. This causes instantaneous disruption to traffic flowing end-to-end, even when routes kept in the vrouter kernel in the data plane remain intact.
Graceful restart and long-lived graceful restart features can be used to alleviate traffic disruption caused by downs.
When configured, graceful restart features enable existing network traffic to be unaffected if Contrail controller processes go down. The Contrail implementation ensures that if a Contrail control module restarts, it can use graceful restart functionality provided by its BGP peers. Or when the BGP peers restart, Contrail provides a graceful restart helper mode to minimize the impact to the network. The graceful restart features can be used to ensure that traffic is not affected by temporary outage of processes.
Graceful restart is not enabled by default.
With graceful restart features enabled, learned routes are not deleted when sessions go down, and the routes are not withdrawn from the advertised peers. Instead, the routes are kept and marked as 'stale'. Consequently, if sessions come back up and routes are relearned, the overall impact to the network is minimized.
After a certain duration, if a downed session does not come back up, all remaining stale routes are deleted and withdrawn from advertised peers.
BGP Graceful Restart Helper Mode
The BGP helper mode can be used to minimize routing churn whenever a BGP session flaps. This is especially helpful if the SDN gateway router goes down gracefully, as in an rpd crash or restart on an MX Series Junos device. In that case, the contrail-control can act as a graceful restart helper to the gateway, by retaining the routes learned from the gateway and advertising them to the rest of the network as applicable. In order for this to work, the restarting router (the SDN gateway in this case) must support and be configured with graceful restart for all of the address families used.
The graceful restart helper mode is also supported for BGP-as-a-Service (BGPaaS) clients. When configured, contrail-control can provide a graceful restart or long-lived graceful restart helper mode to a restarting BGPaaS client.
Feature Highlights
The following are highlights of the graceful restart and long-lived graceful restart features.
Configuring a non-zero restart time enables the ability to advertise graceful restart and long-lived graceful restart capabilities in BGP.
Configuring helper mode enables the ability for graceful restart and long-lived graceful restart helper modes to retain routes even after sessions go down.
With graceful restart configured, whenever a session down event is detected and a closing process is triggered, all routes, across all address families, are marked stale. The stale routes are eligible for best-path election for the configured graceful restart time duration.
When long-lived graceful restart is in effect, stale routes can be retained for a much longer time than that allowed by graceful restart alone. With long-lived graceful restart, route preference is retained and best paths are recomputed. The community marked LLGR_STALE is tagged for stale paths and re-advertised. However, if no long-lived graceful restart community is associated with any received stale route, those routes are not kept, instead, they are deleted.
After a certain time, if a session comes back up, any remaining stale routes are deleted. If the session does not come back up, all retained stale routes are permanently deleted and withdrawn from the advertised peer.
XMPP Helper Mode
Contrail Networking updated support for long-lived graceful restart (LLGR) with XMPP helper mode in Contrail Networking Release 2011.L2. Starting in Release 2011.L2, the Contrail vRouter datapath agent supports route retention with its controller peer when LLGR with XMPP helper mode is enabled. This route retention allows the datapath agent to retain the last Route Path from the Contrail controller when an XMPP-based connection is lost. The Route Paths are held by the agent until a new XMPP-based connection is established to one of the Contrail controllers. Once the XMPP connection is up and is stable for a predefined duration, the Route Paths from the old XMPP connection are flushed. This support for route retention allows a controller to go down gracefully but with some forwarding interruption when connectivity to a controller is restored.
The following notable behaviors are present when LLGR is used with XMPP helper mode:
When a local vRouter is isolated from a Contrail controller, the Intra-VN EVPN routes on the remote vRouter are removed.
During a Contrail vRouter datapath agent restart, forwarding states are not always preserved.
Contrail Networking has limited support for graceful restart and long-lived graceful restart (LLGR) with XMPP helper mode in all Contrail Release 4, 5, and 19 software as well as all Contrail Release 20 software through Contrail Networking Release 2011.L1. Graceful restart and LLGR with XMPP should not be used in most environments and should only be used by expert users in specialized circumstances when running these Contrail Networking releases for reasons described later in this section.
Graceful restart and LLGR can be enabled with XMPP helper mode
using Contrail Command, the Contrail Web UI, or by using the provision_control
script. The helper modes can also be enabled via schema, and can be
disabled selectively in a contrail-control node for BGP or XMPP sessions
by configuring gr_helper_disable
in the /etc/contrail/contrail-control.conf configuration
file.
You should be aware of the following dependencies when enabling graceful restart and LLGR with XMPP helper mode:
You can enable graceful restart and LLGR with XMPP helper mode without enabling the BGP helper. You still have to enable graceful restart, XMPP, and all appropriate timers when graceful restart and LLGR are enabled with XMPP helper mode without the BGP helper.
LLGR and XMPP sub second timers for fast convergences should not be used simultaneously.
If a control node fails when LLGR with XMPP helper mode is enabled, vrouters will hold routes for the length of the GR and LLGR timeout values and continue to pass traffic. Routes are removed from the vRouter when the timeout interval elapses and traffic is no longer forwarded at that point.
If the control node returns to the up state before the timeout interval elapses, a small amount of traffic will be lost during the reconnection.
Graceful restart and LLGR with XMPP should only be used by expert users in specialized circumstances when running Contrail Networking Release 4, 5, and 19 software as well as all Contrail Release 20 software through Contrail Networking Release 2011.L1 due to the following issues:
Graceful restart is not yet fully supported for the contrail-vrouter-agent.
Because graceful restart is not yet supported for the contrail-vrouter-agent, the parameter should not be set for
graceful_restart_xmpp_helper_enable
. If the vrouter agent restarts, the data plane is reset and the routes and flows are reprogrammed anew. This reprogramming typically results in traffic loss for several seconds for new and existing flows and can result in even longer traffic loss periods.The vRouter agent restart caused by enabling graceful restart can cause stale route to be added to the routing table used by the contrail-vrouter-agent.
This issue occurs after a contrail-vrouter-agent reset. After the reset, previous XMPP control nodes continue to send stale routes to other control nodes. The stale routes sent by the previous XMPP control nodes can eventually get passed to the contrail-vrouter-agent and installed into its routing table as NH1/drop routes, leading to traffic drops. The stale routes are removed from the routing table only after graceful restart is enabled globally or when the timer—which is user configurable but can be set to long intervals—expires.
Configuration Parameters
Graceful restart parameters are configured in the global-system-config
of the schema. They can be configured
by means of a provisioning script or by using the Contrail Web UI.
Configure a non-zero restart time to advertise for graceful restart and long-lived graceful restart capabilities from peers.
Configure helper mode for graceful restart and long-lived graceful restart to retain routes even after sessions go down.
Configuration parameters include:
enable
ordisable
for all graceful restart parameters:restart-time
long-lived-restart-time
end-of-rib-timeout
bgp-helper-enable
to enable graceful restart helper mode for BGP peers in contrail-controlxmpp-helper-enable
to enable graceful restart helper mode for XMPP peers (agents) in contrail-control
The following shows configuration by a provision script.
/opt/contrail/utils/provision_control.py --api_server_ip 10.xx.xx.20 --api_server_port 8082 --router_asn 64512 --admin_user admin --admin_password <password> --admin_tenant_name admin --set_graceful_restart_parameters --graceful_restart_time 60 --long_lived_graceful_restart_time 300 --end_of_rib_timeout 30 --graceful_restart_enable --graceful_restart_bgp_helper_enable
The following are sample parameters:
-set_graceful_restart_parameters --graceful_restart_time 300 --long_lived_graceful_restart_time 60000 --end_of_rib_timeout 30 --graceful_restart_enable --graceful_restart_bgp_helper_enable
When BGP peering with Juniper Networks devices, Junos must also be explicitly configured for graceful restart/long-lived graceful restart, as shown in the following example:
set routing-options graceful-restart set protocols bgp group <a1234> type internal set protocols bgp group <a1234> local-address 10.xx.xxx.181 set protocols bgp group <a1234> keep all set protocols bgp group <a1234> family inet-vpn unicast graceful-restart long-lived restarter stale-time 20 set protocols bgp group <a1234> family route-target graceful-restart long-lived restarter stale-time 20 set protocols bgp group <a1234> graceful-restart restart-time 600 set protocols bgp group <a1234> neighbor 10.xx.xx.20 peer-as 64512
The graceful restart helper modes can be enabled in the schema.
The helper modes can be disabled selectively in the contrail-control.conf
for BGP sessions by configuring gr_helper_disable
in the /etc/contrail/contrail-control.conf
file.
The following are examples:
/usr/bin/openstack-config /etc/contrail/contrail-control.conf
DEFAULT gr_helper_bgp_disable 1
/usr/bin/openstack-config /etc/contrail/contrail-control.conf
DEFAULT gr_helper_xmpp_disable 1
service contrail-control restart
For more details about graceful restart configuration, see https://github.com/Juniper/contrail-controller/wiki/Graceful-Restart .
Cautions for Graceful Restart
Be aware of the following caveats when configuring and using graceful restart.
Using the graceful restart/long-lived graceful restart feature with a peer is effective either to all negotiated address families or to none. If a peer signals support for graceful restart/long-lived graceful restart for only a subset of the negotiated address families, the graceful restart helper mode does not come into effect for any family in the set of negotiated address families.
Because graceful restart is not yet supported for contrail-vrouter-agent, the parameter should not be set for
graceful_restart_xmpp_helper_enable
. If the vrouter agent restarts, the data plane is reset and the routes and flows are reprogrammed anew. This reprogramming typically results in traffic loss for several seconds for new and existing flows and can result in even longer traffic loss periods.Additionally, previous XMPP control nodes might continue to send stale routes to other control nodes and these stale routers can be passed to the contrail-vrouter-agent. The contrail-vrouter-agent can install these stale routes into it’s routing table as NH1/ drop routes, causing traffic loss. The stale routes are removed only after graceful restart is enabled globally or when the timer—which is user configurable but can be set to multiple days—expires.
Graceful restart/long-lived graceful restart is not supported for multicast routes.
Graceful restart/long-lived graceful restart helper mode may not work correctly for EVPN routes, if the restarting node does not preserve forwarding state for EVPN routes.
Configuring Graceful Restart
We recommend configuring Graceful Restart using Contrail Command. You can, however, also configure Graceful Restart using the Contrail User Interface in environments not using Contrail Command.
- Configuring Graceful Restart using Contrail Command
- Configuring Graceful Restart with the Contrail User Interface
- Understanding the Graceful Restart Timers
Configuring Graceful Restart using Contrail Command
To configure graceful restart in Contrail Command, navigate to Infrastructure > Cluster > Advanced Options and select the Edit icon near the top right corner of the screen.
The Edit System Configuration window opens. Click the box for Graceful Restart to enable graceful restart, and enter a non-zero number to define the Restart Time in seconds. You can also specify the times for the long-lived graceful restart (LLGR) and the end of RIB timers from this window.
Configuring Graceful Restart with the Contrail User Interface
To configure graceful restart in the Contrail UI, go to Configure > Infrastructure > Global Config, then select the BGP Options tab. The Edit BGP Options window opens. Click the box for Graceful Restart to enable graceful restart, and enter a non-zero value for the Restart Time. Click the helper boxes as needed for BGP Helper and XMPP Helper. You can also enter values for the long-lived graceful restart time in seconds, and for the end of RIB in seconds. See Figure 3.
Understanding the Graceful Restart Timers
Table 1 provides a summary of the graceful restart timers and their associated behaviors.
Timer |
Description |
---|---|
Restart Time |
BGP helper mode—Routes advertised by the BGP peer are kept for the duration of the restart time. XMPP helper mode—Routes advertised by XMPP peer are kept for the duration of the restart time. |
LLGR Time |
BGP helper mode—Routes advertised by BGP peers are kept for the duration of the LLGR timer when BGP helper mode is enabled. XMPP helper mode—Routes advertised by XMPP peers are kept for the duration of the LLGP timer if XMPP helper mode is enabled. When Graceful Restart (GR) and Long-lived Graceful Restart (LLGR) are both configured, the duration of the LLGR timer is the sum of both timers. |
End of RIB timer |
The End of RIB (EOR) timer specifies the amount of time a control node waits to remove stale routes from a vRouter agent’s RIB. When a vRouter agent to Control Node connection is restored, the vRouter agent downloads it’s configuration from the control node. An End of Config message is sent from the control node to vRouter agent when this configuration procedure is complete. The EOR timer starts when this End of Config message is received by the vRouter agent. When the EOR timer expires, an EOR message is sent from the vRouter agent to the control node. The control node receives this EOR message then removes the stale routes which were previously advertised by the vRouter agent from it’s RIB. |
Graceful Restart or Long-Lived Graceful Restart Support for a EVPN Type 2 Route
Today, when a BGP/XXMP peer session restarts or goes down, even if you have configured graceful restart or long-lived graceful restart timers in the Contrail Web UI, the learnt EVPN Type 2 routes are not marked as stale and are deleted (control-node explicitly deletes EVPN routes) from the route database. This results in traffic loss for the EVPN family of routes.
Starting in Contrail Networking Release 21.4, the graceful restart or long-lived graceful restart features support the EVPN Type 2 routes and helps in the following ways:
When a session fails, the learnt EVPN Type 2 routes are not deleted or removed.
Retains the learnt EVPN Type 2 routes and marks them as stale until the configured graceful restart or long-lived graceful restart timers expire.
Results in resuming the sessions and relearning the routes, reducing overall network impact.
If a downed session remains down after the graceful restart or long-lived graceful restart timer has expired, the stale routes are deleted and removed from the advertised peers.
Change History Table
Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.