[ Contents] [ Prev] [ Next] [ Index] [ Report an Error]

Understanding Redundancy Groups

Chassis clustering provides high availability of interfaces and services through redundancy groups and primacy within groups.

Before You Begin

Read Understanding Chassis Cluster Formation

This topic includes:

About Redundancy Groups

A redundancy group is an abstract construct that includes and manages a collection of objects. A redundancy group contains objects on both nodes. A redundancy group is primary on one node and backup on the other at any time. When a redundancy group is said to be primary on a node, its objects on that node are active.

Redundancy groups are independent units of failover. Each redundancy group fails over from one node to the other independent of other redundancy groups. When a redundancy group fails over, all its objects fail over together.

Three things determine the primacy of a redundancy group: the priority configured for the node, the node ID (in case of tied priorities), and the order in which the node comes up. If a lower priority node comes up first, then it will assume the primacy for a redundancy group (and will stay as primary if preempt is not enabled).

A chassis cluster can include many redundancy groups, some of which might be primary on one node and some of which might be primary on the other. Alternatively, all redundancy groups can be primary on a single node. One redundancy group's primacy does not affect another redundancy group's primacy. You can create up to 128 redundancy groups.

Note: The maximum number of redundancy groups is equal to the number of redundant Ethernet interfaces that you configure.

You can configure redundancy groups to suit your deployment. You configure a redundancy group to be primary on one node and backup on the other node. You specify the node on which the group is primary by setting priorities for both nodes within a redundancy group configuration. The node with the higher priority takes precedence, and the redundancy group's objects on it are active.

If a redundancy group is configured so that both nodes have the same priority, the node with the lowest node ID number always takes precedence, and the redundancy group is primary on it. In a two-node cluster, node 0 always takes precedence in a priority tie.

Redundancy Group 0: Routing Engines

When you initialize a device in chassis cluster mode, the system creates a redundancy group referred to in this document as redundancy group 0. Redundancy group 0 manages the primacy and failover between the Routing Engines on each node of the cluster. As is the case for all redundancy groups, redundancy group 0 can be primary on only one node at a time. The node on which redundancy group 0 is primary determines which Routing Engine is active in the cluster. A node is considered the primary node of the cluster if its Routing Engine is the active one.

The redundancy group 0 configuration specifies the priority for each node. Below is how redundancy group 0 primacy is determined. Note that the three seconds value is the interval if the default heartbeat-threshold and heartbeat-interval values are used.

The above priority scheme applies to redundancy groups x as well, provided preempt is not configured.

You cannot enable preemption for redundancy group 0. If you want to change the primary node for redundancy group 0, you must do a manual failover.

Caution: Be cautious and judicious in your use of redundancy group 0 manual failovers. A redundancy group 0 failover implies a Routing Engine failover, in which case all processes running on the primary node are killed and then spawned on the new primary Routing Engine. This failover could result in loss of state, such as routing state, and degrade performance by introducing system churn.

Redundancy Groups 1 Through 128

You can configure one or more redundancy groups numbered 1 through 128, referred to in this chapter as redundancy group x. Each redundancy group x acts as an independent unit of failover and is primary on only one node at a time.

Each redundancy group x contains one or more redundant Ethernet interfaces. A redundant Ethernet interface is a pseudointerface that contains a pair of physical Gigabit Ethernet interfaces or a pair of Fast Ethernet interfaces. If a redundancy group is active on node 0, then the child links of all the associated redundant Ethernet interfaces on node 0 are active. If the redundancy group fails over to node 1, then the child links of all redundant Ethernet interfaces on node 1 become active.

On J Series and SRX Series chassis clusters, you can configure multiple redundancy groups to load-share traffic across the cluster. For example, you can configure some redundancy groups x to be primary on one node and some redundancy groups x to be primary on the other node. You can also configure a redundancy group x in a one-to-one relationship with a single redundant Ethernet interface to control which interface traffic flows through.

The traffic for a redundancy group is processed on the node where the redundancy group is active. Because more than one redundancy group can be configured, it is possible that the traffic from some redundancy groups will be processed on one node while the traffic for other redundancy groups is processed on the other node (depending on where the redundancy group is active). Multiple redundancy groups make it possible for traffic to arrive over an interface of one redundancy group and egress over an interface that belongs to another redundancy group. In this situation, the ingress and egress interfaces might not be active on the same node. When this happens, the traffic is forwarded over the fabric link to the appropriate node.

When you configure a redundancy group x, you must specify a priority for each node to determine the node on which the redundancy group x is primary. The node with the higher priority is selected as primary. The primacy of a redundancy group x can fail over from one node to the other. When a redundancy group x fails over to the other node, its redundant Ethernet interfaces on that node are active and their interfaces are passing traffic.

Table 157 gives an example of redundancy group x in an SRX Series chassis cluster and indicates the node on which the group is primary. It shows the redundant Ethernet interfaces and their interfaces configured for redundancy group x.

Note: SRX210 devices have both Gigabit Ethernet ports and Fast Ethernet ports.

Table 157: Redundancy Groups Example for an SRX Series Chassis Cluster (SRX3000 and SRX5000 Lines)

Group

Primary

Priority

Objects

Interface

Interface

Redundancy group 0

Node 0

Node 0: 254

Routing Engine on node 0

 

Node 1: 2

Routing Engine on node 1

Redundancy group 1

Node 0

Node 0: 254

Redundant Ethernet interface 0

ge-1/0/0

ge-23/0/0

Node 1: 2

Redundant Ethernet interface 1

ge-1/3/0

ge-23/3/0

Redundancy group 2

Node 1

Node 0: 2

Redundant Ethernet interface 2

ge-2/0/0

ge-24/0/0

Node 1: 254

Redundant Ethernet interface 3

ge-2/3/0

ge-24/3/0

Redundancy group 3

Node 0

Node 0: 254

Redundant Ethernet interface 4

ge-3/0/0

ge-25/0/0

Node 1: 2

Redundant Ethernet interface 5

ge-3/3/0

ge-25/3/0

 
 
 

As the example for an SRX Series chassis cluster in Table 157 shows:

Table 158 gives an example of redundancy groups x in a J Series chassis cluster and indicates the node on which each group is primary. It shows the redundant Ethernet interfaces and their interfaces configured for each redundancy group x.

Table 158: Redundancy Groups Example for a J Series Chassis Cluster

Group

Primary

Priority

Objects

Interface

Interface

Redundancy group 0

Node 1

Node 0: 50

Routing Engine on node 0

 

Node 1: 100

Routing Engine on node 1

Redundancy group 1

Node 1

Node 0: 50

Redundant Ethernet interface 0

fe-1/0/0

fe-8/0/0

 

Node 1: 100

Redundant Ethernet interface 1

fe-1/0/1

fe-8/0/1

Redundancy group 2

Node 1

Node 0: 50

Redundant Ethernet interface 2

ge-2/0/0

ge-9/0/0

 
 

Node 1: 100

Redundant Ethernet interface 3

ge-2/0/1

ge-9/0/1

Redundancy group 3

Node 0

Node 0: 100

Redundant Ethernet interface 4

ge-3/0/0

ge-10/0/0

Node 1: 50

Redundant Ethernet interface 5

ge-3/0/1

ge-10/0/1

 
 

As the example for a J Series chassis cluster in Table 158 shows:

Redundancy Group Interface Monitoring

Note: Interface monitoring is not supported for redundancy group 0.

For a redundancy group x to automatically fail over to another node, its interfaces must be monitored. When you configure a redundancy group x, you can specify a set of interfaces that the redundancy group x is to monitor for status (or “health”) to determine whether the interface is up or down. A monitored interface can be a child interface of any of its redundant Ethernet interfaces. When you configure an interface for a redundancy group x to monitor, you give it a weight.

Every redundancy group x has a threshold tolerance value initially set to 255. When an interface monitored by a redundancy group x becomes unavailable, its weight is subtracted from the redundancy group x's threshold. When a redundancy group x's threshold reaches 0, it fails over to the other node. For example, if redundancy group 1 was primary on node 0, on the threshold-crossing event, redundancy group 1 becomes primary on node 1. In this case, all the child interfaces of redundancy group 1's redundant Ethernet interfaces begin handling traffic.

A redundancy group x failover occurs because the cumulative weight of the redundancy group x's monitored interfaces has brought its threshold value to 0. When the monitored interfaces of a redundancy group x on both nodes reach their thresholds at the same time, the redundancy group x is primary on the node with the lower node ID, in this case node 0.

Redundancy Group IP Address Monitoring

Note: IP address monitoring is not supported on the backup device in a chassis cluster.

Redundancy group IP address monitoring checks end-to-end connectivity and allows a redundancy group to fail over because of a failure in the ability of a redundant Ethernet (reth) interface to reach a configured IP address. Redundancy groups on the master device in a cluster can be configured to monitor specific IP addresses to determine whether or not an upstream device in the network is reachable. The redundancy group can be configured such that if the monitored IP address becomes unreachable, the redundancy group will fail over to its backup to maintain service. The primary difference between this monitoring feature and interface monitoring is that IP address monitoring allows for failover when the interface is still up but the network device it is connected to is not reachable for some reason. It may be possible under those circumstances for the other node in the cluster to route traffic around the problem.

IP address monitoring configuration requires that you set not only the address to monitor and its failover value but also an IP address monitoring threshold value. Only after the IP address monitoring threshold is reached due to monitored address reachability failure will the IP address monitoring value be deducted from the redundant group’s failover threshold. Thus, multiple addresses may not only be monitored simultaneously, but also monitored in such a way as to weight their importance to maintaining traffic flow. Also, the threshold value of an IP address that is unreachable and then becomes reachable again will be restored to the monitoring threshold. This will not, however, cause a failback unless the preempt option has been enabled.

When configured, the IP address monitoring failover value is considered along with interface monitoring—if set—and built-in failover monitoring, including SPU monitoring, cold-sync monitoring, and NPC monitoring (on supported platforms). The primary IP addresses that should be monitored are router gateway addresses to ensure that valid traffic coming into the services router can be forwarded to the appropriate network router.

The interval to check the reachability of a monitored IP address is once per second. After failing to reach the configured IP address for five consecutive attempts, the IP is determined to be unreachable and the failover value is deducted from the redundancy group's priority threshold. If the recalculated threshold is not 0, the IP address is marked unreachable only on the primary node and is still marked as reachable on the backup node. If the redundancy group threshold reaches 0 and there are unreachable IP addresses, the redundancy groups will continuously fail over and fail back between the nodes until an unreachable IP address becomes reachable or unreachable monitored IP addresses are removed from monitoring.

To prevent the failure of a monitored IP address from causing failover and failback behavior every six seconds, a 180 second (3 minute) pause timer has been included in the IP address monitoring feature. (Note that there is one pause timer per redundancy group and not one per monitored IP address.) When a monitored IP address failure causes a redundancy group to fail over, a 180 second countdown is started on the new secondary node. During that countdown, the failed monitored IP address failover value is re-added to the redundancy group’s threshold, effectively blocking it from causing a failback. If, after three minutes, the IP address is still unreachable, its value will be deducted from the redundancy group’s threshold. If the failure of multiple monitored IP addresses results in a failover, the behavior is the same; the 180 second pause timer is started to prevent an immediate failback.

The maximum number of monitoring IPs that can be configured per cluster is 64 for the SRX5000 line and 32 for the SRX3000 line. Monitoring can be accomplished only if the IP address is reachable on a redundant Ethernet (reth) interface, and IP addresses cannot be monitored over a tunnel. The feature also cannot be used on a chassis cluster running in transparent mode.

Every redundancy group x has a threshold tolerance value initially set to 255. When an IP address monitored by a redundancy group x becomes unavailable, its weight is subtracted from the redundancy group x's threshold. When a redundancy group x's threshold reaches 0, it fails over to the other node. For example, if redundancy group 1 was primary on node 0, on the threshold-crossing event, redundancy group 1 becomes primary on node 1. In this case, all the child interfaces of redundancy group 1's redundant Ethernet interfaces begin handling traffic.

A redundancy group x failover occurs because the cumulative weight of the redundancy group x's monitored IP addresses has brought its threshold value to 0. When the monitored IP addresses of a redundancy group x on both nodes reach their thresholds at the same time, the redundancy group x is primary on the node with the lower node ID, in this case node 0.

Related Topics


[ Contents] [ Prev] [ Next] [ Index] [ Report an Error]