Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Chassis Cluster on NFX150 Devices

A chassis cluster, where two devices operate as a single device, provides high availability on NFX150 devices. Chassis clustering involves the synchronizing of configuration files and the dynamic runtime session states between the devices, which are part of the chassis cluster setup.

NFX150 Chassis Cluster Overview

You can configure NFX150 devices to operate in cluster mode by connecting and configuring a pair of devices to operate like a single node, providing redundancy at the device, interface, and service level.

When two devices are configured to operate as a chassis cluster, each device becomes a node of that cluster. The two nodes back up each other, with one node acting as the primary device and the other node acting as the secondary device, ensuring stateful failover of processes and services when the system or hardware fails. If the primary device fails, the secondary device takes over the processing of traffic.

The nodes of a cluster are connected together through two links called control link and fabric link. The devices in a chassis cluster synchronize the configuration, kernel, and PFE session states across the cluster to facilitate high availability, failover of stateful services, and load balancing.

  • Control link—Synchronizes the configuration between the nodes. When you submit configuration statements to the cluster, the configuration is automatically synchronized over the control interface.

    To create a control link in a chassis cluster, connect the heth-0-0 port on one node to the heth-0-0 port on the second node.

    Note:

    You can use only the heth-0-0 port to create a control link.

  • Fabric link (data link)—Forwards traffic between the nodes. Traffic arriving on a node that needs to be processed on the other node is forwarded over the fabric link. Similarly, traffic processed on a node that needs to exit through an interface on the other node is forwarded over the fabric link.

    You can use any port except the heth-0-0 to create a fabric link.

Chassis Cluster Modes

The chassis cluster can be configured in active/passive or active/active mode.

  • Active/passive mode—In active/passive mode, the transit traffic passes through the primary node while the backup node is used only in the event of a failure. When a failure occurs, the backup device becomes the primary device and takes over all forwarding tasks.

  • Active/active mode—In active/active mode, the transit traffic passes through both nodes all the time.

Chassis Cluster Interfaces

The chassis cluster interfaces include:

  • Redundant Ethernet (reth) interface—A pseudo-interface that includes a physical interface from each node of a cluster. The reth interface of the active node is responsible for passing the traffic in a chassis cluster setup.

    A reth interface must contain, at minimum, a pair of Fast Ethernet interfaces or a pair of Gigabit Ethernet interfaces that are referred to as child interfaces of the redundant Ethernet interface (the redundant parent). If two or more child interfaces from each node are assigned to the redundant Ethernet interface, a redundant Ethernet interface link aggregation group can be formed.

    Note:

    You can configure a maximum of 128 reth interfaces on NFX150 devices.

  • Control interface—An interface that provides the control link between the two nodes in the cluster. This interface is used for routing updates and for control plane signal traffic, such as heartbeat and threshold information that trigger node failover.

    Note:

    By default, the heth-0-0 port is configured as the dedicated control interface on NFX150 devices. Therefore, you cannot map the heth-0-0 port to any other virtual interface if the device is part of a chassis cluster.

  • Fabric interface—An interface that provides the physical connection between two nodes of a cluster. A fabric interface is formed by connecting a pair of Ethernet interfaces back-to-back (one from each node). The Packet Forwarding Engines of the cluster uses this interface to transmit transit traffic and to synchronize the runtime state of the data plane software. You must specify the physical interfaces to be used for the fabric interface in the configuration.

Chassis Cluster Limitation

Redundant LAG (RLAG) of reth member interfaces of the same node is not supported. A reth interface with more than one child interface per node is called RLAG.

Example: Configuring a Chassis Cluster on NFX150 Devices

This example shows how to set up chassis clustering on NFX150 devices.

Requirements

Before you begin:

  • Physically connect the two devices and ensure that they are the same NFX150 model.

  • Ensure that both devices are running the same Junos OS version

  • Remove all interface mapping for the control port heth-0-0 on both the nodes.

  • Connect the dedicated control port heth-0-0 on node 0 to the heth-0-0 port on node 1.

  • Connect the fabric port on node 0 to the fabric port on node 1.

Overview

Figure 1 shows the topology used in this example. This example shows how to set up basic active/passive chassis clustering. One device actively maintains control of the chassis cluster. The other device passively maintains its state for cluster failover capabilities in case the active device becomes inactive.

Note:

This example does not describe in detail miscellaneous configurations such as how to configure security features. They are essentially the same as they would be for standalone configurations.

Figure 1: NFX150 Chassis ClusterNFX150 Chassis Cluster

Configuration

Configuring a Chassis Cluster

Step-by-Step Procedure
  1. Configure the cluster ID on both the nodes and reboot the devices. A reboot is required to enter into cluster mode after the cluster ID and node ID are set.

    Note:

    You must enter the operational mode to issue the commands on both devices.

    The cluster-id is the same on both devices, but the node ID must be different because one device is node 0 and the other device is node 1. The range for the cluster-id is 0 through 255 and setting it to 0 is equivalent to disabling cluster mode.

  2. Verify that the chassis cluster is configured successfully:

    After the chassis cluster is set up, you can enter the configuration mode and perform all the configurations on the primary node, node0.

  3. Configure the host names and the out-of-band management IP addresses for nodes 0 and 1:

    If you are accessing the device from a different subnet other than the one configured for the out-of-band management, then set up a static route

  4. Map the physical LAN port to the virtual LAN interface on FPC0:

    Note:

    In a chassis cluster, the FPC1 ports on the secondary node are denoted as ge-8/0/x, and the FPC0 ports are denoted as ge-7/0/x.

  5. Map the physical WAN port to the virtual WAN interface on FPC1:

  6. Configure port peering between the FPC0 and FPC1 on nodes 0 and 1. Port peering ensures that when a LAN interface controlled by the Layer 2 dataplane (FPC0) fails, the corresponding interface on the Layer 3 dataplane (FPC1) is marked down and vice versa. This helps in the failover of the corresponding redundant group to the secondary node.

  7. Configure the fabric ports:

  8. Apply the node-specific configurations on nodes 0 and 1:

  9. Enable the system to perform control link recovery automatically. After it determines that the control link is healthy, the system issues an automatic reboot on the node that was disabled when the control link failed. When the disabled node reboots, it rejoins the cluster

  10. Verify the interfaces:

Configuring Redundant Groups and Redundant Interfaces

Step-by-Step Procedure
  1. Configure redundancy groups 1 and 2. Both redundancy-group 1 and redundancy-group 2 control the data plane and include the data plane ports. Each node has interfaces in a redundancy group. As part of redundancy group configuration, you must also define the priority for control plane and data plane—which device is preferred for the control plane, and which device is preferred for the data plane. For chassis clustering, higher priority is preferred. The higher number takes precedence.

    In this configuration, node 0 is the active node as it is associated with redundancy-group 1. reth0 is member of redundancy-group 1 and reth1 is member of redundancy-group 2. You must configure all changes in the cluster through node 0. If node 0 fails, then node 1 will be the active node.

  2. Enable preempt for redundancy-group 1.

    Note:

    If preempt is added to a redundancy group configuration, the device with the higher priority in the group can initiate a failover to become the primary device. By default, preemption is disabled.

  3. Configure the interfaces that the redundancy groups need to monitor to determine whether an interface is up or down.

    By default, redundancy groups have a threshold tolerance value of 255. When an interface monitored by a redundancy group becomes unavailable, its weight is subtracted from the redundancy group's threshold. When a redundancy group's threshold reaches 0, it fails over to the other node.

  4. Configure the data interfaces so that in the event of a data plane failover, the other chassis cluster member can take over the connection seamlessly.

    Define the following parameters:

    • The maximum number of reth interfaces for the cluster, so that the system can allocate the appropriate resources for them.

    • The heartbeat interval and threshold, which define the wait time before failover is triggered in the chassis cluster.

    • Membership information of the member interfaces to reth interfaces.

  5. Configure the reth interfaces:

    • Configure reth1:

    • Configure reth2:

  6. Configure security policies to allow traffic from LAN to WAN, and from WAN to LAN:

Verification

Verifying Chassis Cluster Status

Purpose

Verify the status of the chassis cluster and its interfaces.

Action

From operational mode, issue the following commands:

  • Verify the status of the cluster:

  • Verify the status of the redundancy groups:

  • Verify the status of the interfaces:

  • Verify the status of the port-peering interfaces: