Secure OAM Network Redundancy Overview
Contrail Service Orchestration (CSO) supports secure Operation, Administration, and Maintenance (OAM) network redundancy for provider hub devices in an SD-WAN deployment. You can configure two provider hub devices to act as the primary and secondary OAM hub devices and protect the site against device and link failures ( WAN link between the CPE and the provider hub). If a fault or an outage occurs at the OpCo’s OAM network beyond the primary OAM hub, the OAM connectivity is automatically restored through the secondary OAM hub without any user intervention.
The following sections explain the topology and benefits of secure OAM network redundancy in an SD-WAN deployment.
Figure 1 shows the topology for secure OAM network redundancy.
The CPE device at the on-premise spoke site is connected to two provider hub devices that are configured as OAM hubs. The OAM hub devices are in turn connected to the OAM gateway router. During Zero Touch Provisioning (ZTP), two separate IPsec tunnels are established from the CPE device to the primary and secondary OAM hub devices. The CPE device has a static route (loopback lo0.1) to both the OAM hubs through the IPsec tunnels.
When the provider hub device is onboarded, the BGP sessions are established. During the BGP sessions, the OAM hub device advertises the CSO subnet to the CPE device and the CPE device advertises the OAM subnet to the OAM hub device.
BGP supports primary and backup OAM hub by using local preference(hub-primary-select option) on the CPE device at the on-premise spoke site. The CPE device decides whether the OAM hub is primary or secondary based on the hub-primary-select option. If the primary OAM hub fails or losses the CSO routes from the OAM gateway, then the secondary OAM hub is used. The CPE device advertises the OAM subnet to the OAM hubs. The OAM hubs, in turn, advertises the OAM subnet to the OAM gateway router.
In case the SINGLE_SSH feature is enabled in the device template, then only one IP address (loopback ip) is advertised. In case the SINGLE_SSH feature is disabled in the device template, then the OAM subnet is advertised.
The details of the BGP session that is established during ZTP are as follows:
External BGP (eBGP) session is established between the OAM hub device and the OAM gateway router. During the eBGP session, the OAM gateway router advertises the CSO route reachability (CSO prefix and VRR prefixes) to both primary and secondary OAM hubs.
Internal BGP (iBGP) session is established between the CPE device at the on-premise spoke site and the OAM hub device. During this session the OAM hub device advertises the learned CSO route to the CPE device at the on-premise spoke site. The CPE device learns routes from both primary and secondary OAM hub devices, and configures the primary OAM hub device with a higher preference and the backup OAM hub device with a lower preference.
Adding and configuring provider hub devices
The workflow to add and configure provider hub devices to support redundant secure OAM network is similar to adding a single provider hub device. For more information about adding and configuring a provider hub device, see Adding Provider Hub Sites for SD-WAN Deployment.
While adding the first provider hub device in any deployment, ensure that the capability of the device is set to DATA and OAM.
Adding and configuring an on-premise spoke site
The workflow to configure an on-premise spoke site to support redundant secure OAM network is similar to adding a single on-premise spoke site. For more information about adding and configuring an on-premise spoke site, see Adding an On-Premise Spoke Site with SD-WAN Capability.
In real time-optimized deployments, you must enable the Connect to Hubs feature to establish secure OAM IPsec tunnels.
In bandwidth-optimized deployments, you must enable the Use for OAM Traffic option on at least one WAN link to establish secure OAM IPsec tunnels.
On NFX250 devices, you must enable the traffic type as OAM_AND_DATA for at least one WAN link.
Failure Detection and Recovery
In case of network failure at the OpCo’s OAM network behind the primary OAM hub, the route to primary OAM hub breaks and as a result, the primary OAM hub loses the route. The route from primary OAM hub to spoke for CSO breaks. As a result, the spoke obtains the route from the secondary OAM hub. The OAM traffic then moves from primary OAM hub to secondary OAM hub.
When the primary OAM hub is active, the BGP session is established and the primary OAM hub receives the route and propogates the route to the spoke. Because the primary OAM hub is configured with a higher preference in the spoke device, when the spoke receives the traffic from primary OAM hub, the OAM traffic will switches back to primary OAM hub.
Benefits of Secure OAM Network Redundancy
Hub device redundancy—In case of multihoming at the spoke sites, each CPE device at the site is connected to two provider hubs devices, which function as primary and secondary provider hub devices. Two separate IPsec tunnels are established from the SD-WAN site to both primary and secondary provider hub devices. This hub device redundancy ensures that the OAM traffic is not lost even if a hub fails.