Chassis Cluster on NFX350 Devices

A chassis cluster, where two devices operate as a single device, provides high availability (HA) on NFX350 devices. Chassis clustering involves the synchronizing of configuration files and the dynamic runtime session states between the devices, which are part of the chassis cluster setup.

NFX350 Chassis Cluster Overview

You can configure NFX350 devices to operate in cluster mode by connecting and configuring a pair of devices to operate like a single node, providing redundancy at the device, interface, and service level.

When two devices are configured to operate as a chassis cluster, each device becomes a node of that cluster. The two nodes back up each other, with one node acting as the primary device and the other node acting as the secondary device, ensuring stateful failover of processes and services when the system or hardware fails. If the primary device fails, the secondary device takes over the processing of traffic.

The nodes of a cluster are connected together through two links called control link and fabric link. The devices in a chassis cluster synchronize the configuration, kernel, and PFE session states across the cluster to facilitate high availability, failover of stateful services, and load balancing.

Control link—Synchronizes the configuration between the nodes. When you submit configuration statements to the cluster, the configuration is automatically synchronized over the control interface.

To create a control link in a chassis cluster, connect the ge-0/0/0 interface on one node to the ge-0/0/0 interface on the second node.

Note:
You can use only the ge-0/0/0 interface to create a control link.
Fabric link (data link)—Forwards traffic between the nodes. Traffic arriving on a node that needs to be processed on the other node is forwarded over the fabric link. Similarly, traffic processed on a node that needs to exit through an interface on the other node is forwarded over the fabric link.

You can use any interface except the ge-0/0/0 to create a fabric link.

Chassis Cluster Modes

The chassis cluster can be configured in active/passive or active/active mode.

Active/passive mode—In active/passive mode, the transit traffic passes through the primary node while the backup node is used only in the event of a failure. When a failure occurs, the backup device becomes the primary and takes over all forwarding tasks.
Active/active mode—In active/active mode, the transit traffic passes through both nodes all the time.

Chassis Cluster Interfaces

The chassis cluster interfaces include:

Redundant Ethernet (reth) interface—A pseudo-interface that includes a physical interface from each node of a cluster. The reth interface of the active node is responsible for passing the traffic in a chassis cluster setup.

A reth interface must contain, at minimum, a pair of Fast Ethernet interfaces or a pair of Gigabit Ethernet interfaces that are referred to as child interfaces of the redundant Ethernet interface (the redundant parent). If two or more child interfaces from each node are assigned to the redundant Ethernet interface, a redundant Ethernet interface link aggregation group can be formed.

Note:
You can configure a maximum of 128 reth interfaces on NFX350 devices.
Control interface—An interface that provides the control link between the two nodes in the cluster. This interface is used for routing updates and for control plane signal traffic, such as heartbeat and threshold information that trigger node failover.

Note:
By default, the ge-0/0/0 interface is configured as the dedicated control interface on NFX350 devices. Therefore, you cannot apply any configuration to ge-0/0/0 in HA mode.
Fabric interface—An interface that provides the physical connection between two nodes of a cluster. A fabric interface is formed by connecting a pair of Ethernet interfaces back-to-back (one from each node). The Packet Forwarding Engines of the cluster uses this interface to transmit transit traffic and to synchronize the runtime state of the data plane software. You must specify the physical interfaces to be used for the fabric interface in the configuration.

Chassis Cluster Limitation

Redundant LAG (RLAG) of reth member interfaces of the same node is not supported. A reth interface with more than one child interface per node is called RLAG.

Example: Configuring a Chassis Cluster on NFX350 Devices

This example shows how to set up chassis clustering on NFX350 devices.

Requirements
Overview
Configuration
Verification

Requirements

Before you begin:

Physically connect the two devices and ensure that they are the same NFX350 model.
Ensure that both devices are running the same Junos OS version
Remove all interface mapping for the control port ge-0/0/0 on both the nodes.
Connect the dedicated control port ge-0/0/0 on node 0 to the ge-0/0/0 port on node 1.
Connect the fabric port on node 0 to the fabric port on node 1.

Overview

Figure 1 shows the topology used in this example. This example shows how to set up basic active/passive chassis clustering. One device actively maintains control of the chassis cluster. The other device passively maintains its state for cluster failover capabilities in case the active device becomes inactive.

Note:

This example does not describe in detail miscellaneous configurations such as how to configure security features. They are essentially the same as they would be for standalone configurations.

Figure 1: NFX350 Chassis Cluster

Configuration

Configuring a Chassis Cluster
Configure Fabric interfaces
Configure Redundant Groups and Redundant Interfaces

Configuring a Chassis Cluster

Step-by-Step Procedure

Configure the cluster ID on both the nodes and reboot the devices. A reboot is required to enter into cluster mode after the cluster ID and node ID are set.

Note:
You must enter the operational mode to issue the commands on both devices.
The cluster-id is the same on both devices, but the node ID must be different because one device is node 0 and the other device is node 1. The range for the cluster-id is 0 through 255 and setting it to 0 is equivalent to disabling cluster mode.

Verify that the chassis cluster is configured successfully:

After the chassis cluster is set up, you can enter the configuration mode and perform all the configurations on the primary node, node0.

Configure the host names and the out-of-band management IP addresses for nodes 0 and 1:

If you are accessing the device from a different subnet other than the one configured for the out-of-band management, then set up a static route:

Configure a backup router to access the router from an external network for the out-of-band management

Configure Fabric interfaces

Step-by-Step Procedure

The ge-0/0/0 interface is a pre-defined control link. Therefore, you should select any other interface on the device to configure a fabric interface. For example, in the below configuration, ge-0/0/1 is used as the fabric interface.

Connect one end of the Ethernet cable to ge-0/0/1 on NFX250NG-1 device and the other end of the cable to ge-0/0/1 on NFX250NG-2 device.

Map physical LAN to virtual WAN port:

Configure front panel (L2) interfaces corresponding to fabric interface:

Configure L3 interfaces as fabric member:

Configure data path for fabric interfaces:

Configure port peering for fabric and reth members. Port peering ensures that when a LAN interface controlled by the Layer 2 dataplane (FPC0) fails, the corresponding interface on the Layer 3 dataplane (FPC1) is marked down and vice versa. This helps in the failover of the corresponding redundant group to the secondary node.
Enable the system to perform control link recovery automatically. After it determines that the control link is healthy, the system issues an automatic reboot on the node that was disabled when the control link failed. When the disabled node reboots, it rejoins the cluster.

Configure Redundant Groups and Redundant Interfaces

Step-by-Step Procedure

Configure redundancy groups 1 and 2. The redundancy-group 1 RG controls the control plane and it determines the primary node. The redundancy-group 2 RG controls the data plane and includes the data plane ports. Each node has interfaces in a redundancy group.

As part of redundancy group configuration, you must also define the priority for control plane and data plane—which device is preferred for the control plane, and which device is preferred for the data plane. For chassis clustering, higher priority is preferred. The higher number takes precedence.

In this configuration, node 0 is the active node as it is associated with redundancy-group 1. reth0 is member of redundancy-group 1 and reth1 is member of redundancy-group 2. You must configure all changes in the cluster through node 0. If node 0 fails, then node 1 will be the active node.

Map physical LAN to virtual WAN port for reth members:

Configure front panel (L2) interfaces corresponding to reth interface:

Configure WAN (L3) interfaces as reth member:

Configure reth interfaces:

Configure reth0:

Configure reth1:

Configure interface monitoring for reth interfaces members:

Configure port peering for reth interface members:

Configure security policies to allow traffic from LAN to WAN, and from WAN to LAN:

Verification

Purpose

Verify the status of the chassis cluster and its interfaces.

Action

From operational mode, issue the following commands:

Verify the status of the cluster:

Verify the status of the redundancy groups:

Verify the status of the interfaces:

Verify the status of the port-peering interfaces:

ON THIS PAGE

Chassis Cluster on NFX350 Devices

NFX350 Chassis Cluster Overview

Chassis Cluster Modes

Chassis Cluster Interfaces

Chassis Cluster Limitation

Example: Configuring a Chassis Cluster on NFX350 Devices

Requirements

Overview

Configuration

Configuring a Chassis Cluster

Step-by-Step Procedure

Configure Fabric interfaces

Step-by-Step Procedure

Configure Redundant Groups and Redundant Interfaces

Step-by-Step Procedure

Verification

Verifying Chassis Cluster Status

Purpose

Action

Related Documentation