Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Multinode High Availability

 

Multinode High Availability

Multinode High Availability solution introduces a model suitable for certain L3 based deployments with the goal of simplicity and reliability.

Overview

Multinode High Availability provides inter-chassis redundancy on SRX devices in a L3 network. It supports two SRX devices presenting themselves as independent nodes to the rest of the network. In this mode, the nodes can either be co-located or separated across the geography. These nodes are connected with an interchassis link (ICL) to synchronize configuration, control plane, and data plane states across theSRX nodes to handle device failover scenarios. This mode provides redundancy at the service level, which minimizes the control plane state that needs to be synchronized across the nodes. Control plane state is required only for services like IPsec which has an active control plane protocol state.

Figure 1 shows a simplified view of a deployment where two SRX Series devices are reachable over an ICL.

Figure 1: Deployment of SRX Nodes with Interchassis Link
Deployment of SRX Nodes with Interchassis
Link

In this deployment, both nodes are connected to each other over an interchassis link and are also connected to adjacent routers on trust and on untrust network . The nodes in High Availability (HA) pair act in standalone mode since the Routing Engines (REs) on each node will only be connected to their local PFEs. The REs on each of the nodes will run their own routing protocols (Static, OSPF, BGP, IS-IS, ANYCAST, VRRP) and state machines. The nodes agree on the primary and backup roles by talking to each other using an interchassis link or the IP link.

The interchassis link is based on a couple of IP addresses reachable to each other through an IP network. The link is generally configured in a different routing instance to separate from regular traffic.

The traffic between the HA nodes is protected using the trusted IPsec protocols. Traffic is steered towards the node where the service is active by upstream and downstream routers.

Following commands are added to the Multinode High Availability configuration for interchassis link encryption:

Benefits

  • Reduced cost of deployments

  • Improved geo-redundancy deployability

Multinode High Availability Modes

Interchassis link related changes are applicable to both active/active and active/backup modes.

  • Active/backup-In this mode, one node is active in forwarding traffic and the second node is ready to start processing traffic instantly.

  • Active/active- In this mode, each node can forward traffic for services not terminated on a node in a SRG.

Active/Active Data Plane

This mode works for stateless services which does not have control plane state. For example, Firewall, ALG, NAT.

In this mode, there is only active and warm states in the data plane. The activeness is maintained at per session level and the transition from warm to active is packet driven. When a packet is received in warm state, the session transitions its state from warm to active. The node that is receiving traffic marks flow session active and the node that is not receiving traffic marks the session as warm.

After failover the node which now gets the traffic notifies the remote node about the session movement. The remote node then transitions the session from active to warm.

Active/Backup Control and Data Plane

In the Active/backup mode, services require both controls and data plane states to be synchronized across the nodes. Whenever an active SRX node fails for a service function, both control and data plane needs to failover to the backup node at the same time.

Figure 2: Deployment of Multinode High Availability in Active/Backup Mode
Deployment of Multinode High Availability in Active/Backup
Mode

The active/backup state in HA mode involves:

Activeness Determination

In this mode, the SRX devices are configured with a floating IP address on the loopback interface “lo0”. . Administrator needs to configure “lo0” interfaces and the “floating IPs” as required.

Both the SRX devices initially advertise the route for the floating IP address to the routers R1 and R2 with equal preferences. But the routers can have their own preferences on one of the paths based on the IP address. If a certain node is preferred to take over the activeness at boot up, you can configure higher default route preference on the routing policy towards the probe router on that node instead of the equal preference.

After reboot, both the SRX devices start from hold state and continuously probe a configured HA activeness determination probe IP on R1 or R2 which is used as the destination IP address and the floating IP as the source. The router hosting the probe destination IP address replies only to the SRX node which lies on its preferred routing path. The node that gets the probe reply promotes itself to the active role and communicate to the other node. The other node stops probing and takes over the backup role.

Activeness Enforcement

After the activeness is determined, the active node starts advertising active (higher) preference path for all the remote and local routes (including the floating IP address) to both routers to ensure all the traffic is drawn towards the active node. The backup node will advertise the default (lower) priority ensuring no packet is forwarded to the backup node either by R1 and R2.

Activeness is notified to peer node through ICL. The active node advertises the high preference path to adjacent neighbors. The service processes (IKED, AUTHD, PKID) take an active role on the node with Active state and backup on other node.

When the failover happens and the old backup node transitions to Active role, the route preference is swapped to drive all the traffic to the new active node.

The switch in the route preference advertisement is driven by route policies configured on SRX1 and SRX2 with the if-route-exists condition. You can configure the required route active-signal-route-ip to be used with this condition and HA module will add this route to the routing table on moving to active role which will activate the policy to start advertising higher preference.

You can configure the backup-signal-route-ip for the backup node to advertise a medium priority. When the HA link is down, the active node removes the active-signal-route. The old backup node can detect this through its probes with which the default routing preference towards the old active node is overwritten with the medium priority.

If a certain node is preferred to take over the activeness at boot up, you need to configure the routing policy based on the sample routing policy as below:

In the configuration, the routing policies must have three priorities: low, medium and high. Low is default, medium is backup and high is active.

Failure Scenarios

In case of external path failure, link failure, and BFD down to adjacent routers, routes are re-advertised to swap the preference and the packets start taking the path to peer node, which is active.

In case of node failure where the peer node is active, the processes (flowd, IKED, IKEMD, PKID, AUTHD, RE kernel) are restarted.

  • Flowd restart process impacts the session on the respective SPU and will also bring down the ICL.

  • Infrastructure (REPD) restart process impacts new sessions or some tunnels in rekey window.

  • The packets are dropped if the node is in ineligible mode. Node state transitions to ineligible when external or internal path fails or any node failure occurs.

Services Redundancy/Failover Group

A services redundancy group (SRG) allows user to define the following two types of services based on which different HA modes are supported:

  • SRG0- This mode is applicable to all services which does not have control plane state. For example (Firewall, NAT, and ALG).

  • SRG1-This mode is applicable to the services that have control plane state. For example, IPsec.

You can configure services-redundancy-group to define the control plane HA stateless services and IPsec VPN service.

To enable the active/backup mode:

  1. Enable the default HA mode.
    user@host# set chassis high-availability services-redundancy-group 0
  2. Enable the VPN service in active/backup mode.
    user@host# set chassis high-availability services-redundancy-group 1
  3. Specify the designated floating IP address for the SRG. The IP address is used as the source IP address to determine the activeness.
    user@host# set high-availability services-redundancy-group 1 floating-ip IP address

Monitoring failures

In case of any failures, the local node takes action and the remote node is notified about the action, if the interchassis link is up.

If the interchassis link is down, the remote node knows the state change by the activeness probing that it would already be doing because of the ICL down event earlier.

Following actions are listed:

  • SRG0- If the failure is related to SRG0, SRG0 goes to isolated state and shuts down the user configured shut-on-failure interfaces.

  • SRG1- The nodes goes to ineligible state and removes the active/backup signal route.

On failure recovery:

  • SRG0- Waits for the subsequent cold sync to complete and transitions to online state and brings up the interfaces which it shuts down earlier.

  • SRG1-Transition out of Ineligible state and the new state is determined based on remote nodes HA state and ICL link status at that time.

When the active node goes down abruptly for any reason, the interchassis link is detected down on the remote SRX node and the activeness determination probing is restarted to prevent split brain scenario. In this case, since the old active is down and hence the router hosting the probe destination IP address has lost its higher preference path, the probe result is a success and the remote SRX node transitions to active state.

In spite of the split brain prevention mechanism, theoretically the nodes can still get in to a ACTIVE-ACTIVE state when the interchassis link is down and there are network issues on the probe router at the same time because of which it replies to probe requests from both the SRX nodes. In this case once the situation improves and the interchassis link is up, the node with lower local id backs off and takes the backup role .This behavior can be overridden by the optional activeness-priority configuration.

Support for VPN on HA Nodes in Multinode High Availability Solution

VPN service is automatically enabled when you enable the active/backup mode using set chassis high-availability services-redundancy-group 1 command. The multinode high availability solution allows you to synchronize IKE negotiations from active to the backup. The inter chassis link (ICL) connects the active and backup nodes for exchange of the synchronization data. See Multinode High Availability.

IPsec feature is supported on multinode HA. IPsec runs actively on one node (or active node). It can failover to the secondary node (or backup node). IKE negotiations occurs from active node and the states are synchronized with the backup node. After synchronization, the backup node will be ready for mastership role and continues without bringing down the tunnels after switchover. You can run the show command(s) on both active and backup nodes to display the status of IKE and IPsec security associations. You can delete the IKE and IPsec security associations only on the active node.

When you enable multinode high availability feature, the dynamic CA profiles are loaded only on the node during the IKE negotiation. If a failover occurs, the new active node undergoes a new IKE negotiation and loads the dynamic CA certificates as part of that negotiation. When PKID restarts, dynamic CA certificates are deleted only from the node where PKID was restarted.

Now lets discuss the scenario how to enable VPN in active/backup mode in Multinode High Availability solution. First, you must configure SRG1 to enable active/backup mode in Multinode High Availability.

Ensure that the tunnels are anchored in lo0 interface. For this, you need to configure IKE tunnel end point IP address on local lo0.x interface (where ’x’ represents the interface subunit) on both the active and backup devices and this IP is called floating IP. This lo0.x is now configured as the external interface for IKE gateway and the floating IP is configured as local address. Route on the adjacent routers for this floating IP address will be pointing to the active device. This ensures that at any given point, the IKE negotiation will initiate from the active device.

Figure 3 shows both active and backup SRX Series devices with floating IP address.

Figure 3: Active/backup Nodes With Floating IP Address
Active/backup Nodes With
 Floating IP Address

Following are the steps to configure VPN and assign the same floating IP address to the active and backup node. Note that in this example, loopback interface (lo0.0) is used as external interface and loopback address (11.0.0.1) is assigned to the local address.

This example shows how to encrypt or protect the High Availability (HA) traffic that traverse between the SRX5000 Series devices (HA nodes) when the Multinode High Availability feature is enabled.

Requirements

This example uses the following hardware and software components:

  • Two SRX5000 line of devices

  • MX or vSRX devices as adjacent routers

  • Junos OS Release 20.4R1 or later

Overview

In Mulitnode High Availability mode, nodes deployed at different locations are connected by Public IPs to synchronize all the HA related information between them. Since HA traffic traverse over the internet and it can be a security issue, these packets need to be encrypted.

To protect the HA traffic between active and backup nodes, IPsec VPN tunnel is established between the nodes as soon as they come up. It is assumed that without HA link encryption, these devices can communicate with their local IP addresses as if they are on the same subnet. When HA link encryption is enabled, all the local IP based HA traffic are tunneled through IPsec. For better protection, IPsec SAs are negotiated by IKE between the nodes instead of manual VPN. In Multinode HA mode, the tunnel is installed to PFE and encrypt both the control and Real Time Objects (RTO) between the nodes.

Perform the following configuration to protect the HA traffic between the HA nodes:

  • Configure a VPN profile for the HA traffic using the vpn-profile profile-name option at the [edit chassis high-availability peer-id peer-id] hierarchy level.

  • Encrypt the HA traffic for the specific VPN profile using the ha-link-encryption option at the [edit security ipsec vpn vpn-name] hierarchy level.

This configuration creates an ICL tunnel where only IKEv2 is supported for secure HA traffic flow. ICL tunnels support only site-to-site IPsec VPN tunnels.

View HA tunnel related information using the show security ike security-associations, show security ike active-peer, show security ipsec security-associations, and show security ipsec statistics commands.

Clear HA tunnel related information using the clear security ike security-associations and clear security ipsec security-associations commands.

Topology

In this example, two SRX5000 Series devices that are at different geographical locations act as a high availability node. As the control and data traffic passes through the internet, the traffic needs to be secured. To secure the HA traffic, the HA tunnel is encrypted using IPsec protocols.

Figure 4 shows the topology in which SRX5000 Series devices supports encrypting the traffic that transverses through the HA tunnel between the layer 3 high availability nodes.

Figure 4: High Availability Tunnel Link Encryption
High Availability Tunnel
Link Encryption

Configuration

CLI Quick Configuration

To quickly configure this example, copy the following commands, paste them into a text file, remove any line breaks, change any details necessary to match your network configuration, copy and paste the commands into the CLI at the [edit] hierarchy level, and then enter commit from configuration mode.

To configure HA tunnel to protect the traffic between the two HA nodes (22.0.0.1 and 22.0.0.2) using the IPsec protocols, configure the following:

Configuring Link Encryption on High Availability Nodes

Step-by-Step Procedure

The following example requires you to navigate various levels in the configuration hierarchy. For instructions on how to do that, see Using the CLI Editor in Configuration Mode in the CLI User Guide.

  1. Configure both local and peer information for high availability.
  2. Configure multinode High Availability mode and associate the peer node id 2 to the Service Redundancy Group (SRG).
  3. Configure services redundancy group 1.
  4. Define an IKE proposal and the IKE proposal authentication method. Also define the Diffie-Hellman group, authentication algorithm, an encryption algorithm for the IKE proposal.
  5. Configure an IKE policy and associate the policy with the IKE proposal. Also define the authentication method.
  6. Define the gateway policy reference and gateway version. For High availability feature, you must configure the IKE version as v2-only.
  7. Specify the IPsec proposal protocol and encryption algorithm.
  8. Create the multinode high availability IPsec policy.
  9. Enable multinode high availability feature for the IPsec VPN.

    The same VPN name L3HA_IPSEC_VPN must be mentioned for vpn_profile in chassis high availability configuration. See step12.

  10. Specify the IKE gateway.
  11. Specify the L3HA IPsec policies.
  12. The IPsec VPN profile L3HA_IPSEC_VPN is attached to chassis high availability configuration to establish a secure interchassis link or tunnel between the HA nodes 22.0.0.1 and 22.0.0.2.
  13. Specify allowed system services for the halink security zone.
  14. Assign an interface to the halink security zone.
  15. Configure policy options.

Results

From configuration mode, confirm your configuration by entering the show security ike, show security ipsec, show policy-options, and show chassis high-availability commands. If the output does not display the intended configuration, repeat the configuration instructions in this example to correct it.

If you are done configuring the device, enter commit from configuration mode.

Verification

Confirm that the configuration is working properly.

Purpose

To verify only interchassis link active peers, but not regular IKE active peers.

Action

user@host# show security ike active-peer ha-link-encryption

Meaning

Displays only the active peer of interchassis link tunnel with details such as the peer addresses and ports the active peer is using.

Verify Security Associations Created for Interchassis Link Tunnel

Purpose

To verify the multi SAs created for the interchassis link encryption tunnel.

Action

user@host# show security ipsec security-associations ha-link-encryption

Meaning

The output from the show security ipsec security-associations ha-link-encryption command lists the following information:

  • The remote gateway has an IP address of 22.0.0.2.

  • The SPIs, lifetime (in seconds), and usage limits (or lifesize in KB) are shown for both directions. The 1208/ value indicates that the Phase 2 lifetime expires in 1208 seconds, and that no lifesize has been specified, which indicates that it is unlimited. The Phase 2 lifetime can differ from the Phase 1 lifetime, because Phase 2 is not dependent on Phase 1 after the VPN is up.

Purpose

To verify the interchassis link tunnel mode.

Action

user@host# show security ipsec sa detail ha-link-encryption

Meaning

The above output from the show security ipsec sa detail ha-link-encryption command lists the following information:

  • The local identity and remote identity make up the proxy ID for the SA.

  • Displays the IPsec SA pair for each threads in PIC.

  • Below line in the IPsec SA output indicates HA link encryption tunnel mode.

    HA Link Encryption Mode: Multi-Node
  • Authentication and encryption algorithms used.

Purpose

To verify link encryption tunnel statistics on both active and backup nodes.

Action

user@host# show security ipsec statistics ha-link-encryption

You can also use the show security ipsec statistics ha-link-encryption command to review statistics and errors for all SAs.

To clear all IPsec statistics, use the clear security ipsec statistics ha-link-encryption command.

Meaning

If you see packet loss issues across a VPN, you can run the show security ipsec statistics ha-link-encryption command several times to confirm that the encrypted and decrypted packet counters are incrementing. You should also check that the other error counters are incrementing.