Chassis Cluster Descriptions and Deployment Scenarios
Various Deployments of a Chassis Cluster
Firewall deployments can be active/passive or active/active.
Active/passive chassis cluster mode is the most common type of chassis cluster firewall deployment and consists of two firewall members of a cluster. One actively provides routing, firewall, NAT, virtual private network (VPN), and security services, along with maintaining control of the chassis cluster. The other firewall passively maintains its state for cluster failover capabilities should the active firewall become inactive.
SRX Series devices support the active/active chassis cluster mode for environments in which you want to maintain traffic on both chassis cluster members whenever possible. In an SRX Series device active/active deployment, only the data plane is in active/active mode, while the control plane is actually in active/passive mode. This allows one control plane to control both chassis members as a single logical device, and in case of control plane failure, the control plane can fail over to the other unit. This also means that the data plane can fail over independently of the control plane. Active/active mode also allows for ingress interfaces to be on one cluster member, with the egress interface on the other. When this happens, the data traffic must pass through the data fabric to go to the other cluster member and out of the egress interface. This is known as Z mode. Active/active mode also allows the routers to have local interfaces on individual cluster members that are not shared among the cluster in failover, but rather only exist on a single chassis. These interfaces are often used in conjunction with dynamic routing protocols that fail traffic over to the other cluster member if needed. Figure 1 shows two SRX5800 devices in a cluster.
To effectively manage the SRX clusters, network management applications must do the following:
Identify and monitor primary and secondary nodes
Monitor redundancy groups and interfaces
Monitor control and data planes
Monitor switchovers and failures
Figure 2 shows the SRX Series high-end devices configuration for out-of-band management and administration.
Figure 3 shows the SRX Series branch devices configuration for out-of-band management and administration.
Connecting Primary and Secondary Nodes
The following is the best configuration to connect to the cluster from management systems. This configuration ensures that the management system is able to connect to both primary and secondary nodes.
Explanation of Configuration
The best way to connect to an SRX Series chassis cluster through the fxp0 interface (a new type of interface) is to assign IP addresses to both management ports on the primary and secondary nodes using groups.
Use a primary-only IP address across the cluster. This way, you can query a single IP address and that IP address is always the primary for redundancy group 0. If you are not using a primary-only IPv4 address, each node IP address must be added and monitored. Secondary node monitoring is limited, as detailed in this topic.
We recommend using a primary-only IPv4 address for management, especially while using SNMP. This enables the device to be reachable even after failover.
With the fxp0 interface configuration previously shown, the management IPv4 address on the fxp0 interface of the secondary node in a chassis cluster is not reachable. The secondary node routing subsystem is not running. The fxp0 interface is reachable by hosts that are on the same subnet as the management IPv4 address. If the host is on a different subnet than the management IPv4 address, then communication fails. This is an expected behavior and works as designed. The secondary cluster member’s Routing Engine is not operational until failover. The routing protocol process does not work in the secondary node when the primary node is active. When management access is needed, the backup-router configuration statement can be used.
With the backup-router statement, the secondary node can be accessed from an external subnet for management purposes. Due to a system limitation, do not configure the destination address specified in the backup-router as ‘0.0.0.0/0’ or ‘::/0’. The mask has to be a non-zero value. Multiple destinations can be included if your management IP address range is not contiguous. In this example, backup router 172.19.100.1 is reachable through the fxp0 interface, and the destination network management system IPv4 address is 10.200.0.1. The network management address is reachable through the backup router. For the backup router to reach the network management system, include the destination subnet in the backup router configuration.
We recommend using the outbound SSH address to connect to the management systems by using the SSH protocol, NETCONF XML management protocol, or Junos OS XML Management Protocol. This ensures that the device connects back automatically even after a switchover.
We recommend using same SNMP engine IDs for each node. This is because SNMPv3 uses the SNMP engine ID values for authentication of the protocol data units (PDUs), and if SNMP engine ID values are different for each node, then SNMPv3 might fail after a routing engine switchover.
Keep other SNMP configurations, such as the SNMP communities, trap-groups, and so on, common between the nodes as shown in the sample configuration.
SNMP traps are sent only from the primary node. This includes events and failures detected on the secondary node. The secondary node never sends SNMP traps or alerts. Use the client-only configurable option to restrict SNMP access to the required clients only. Use SNMPv3 for encryption and authentication.
Syslog messages should be sent from both nodes separately as the log messages are node specific.
If the management station is on a different subnet than the management IP addresses, specify the same subnet in the backup router configuration and add a static route under the [edit routing-options] hierarchy level if required. In the previous sample configuration, the network management address 10.200.0.1 is reachable through the backup router. Therefore, a static route is configured.
You can restrict access to the device using firewall filters. The previous sample configuration shows that SSH, SNMP, and Telnet are restricted to the 10.0.0.0/8 network. This configuration allows UDP, ICMP, OSPF, and NTP traffic and denies other traffic. This filter is applied to the fxp0 interface.
You can also use security zones to restrict the traffic. For more information, see the Junos OS Security Configuration Guide.
Additional Configuration for SRX Series Branch Devices
The factory default configuration for the SRX100, SRX210, and SRX240 devices automatically enables Layer 2 Ethernet switching. Because Layer 2 Ethernet switching is not supported in chassis cluster mode, for these devices, if you use the factory default configuration, you must delete the Ethernet switching configuration before you enable chassis clustering.
There is no dedicated fxp0 management interface. The fxp0 interface is repurposed from a built-in interface. For example, on SRX100 devices, the fe-0/0/06 interface is repurposed as the management interface and is automatically renamed fxp0. For more information about the management interface, see the Junos OS Security Configuration Guide.
Syslog should be used with caution. It can cause cluster instability. Data plane logging should never be sent through syslogs for SRX Series branch devices.
Managing Chassis Clusters
Managing chassis clusters through redundant Ethernet interfaces—SRX Series chassis clusters can be managed using the redundant Ethernet (reth) interfaces. Configuration of redundancy groups and reth interfaces differ based on deployments such as active/active mode and active/passive mode. See the Junos OS Security Configuration Guide for details of the configuration. Once the reth interfaces are configured and are reachable from the management station, secondary nodes can be accessed through the reth interfaces.
If the reth interface belongs to redundancy group 1+, then the TCP connection to the management station is seamlessly transitioned to the new primary. But if redundancy group 0 failover occurs and the Routing Engine switches over to a new node, then connectivity is lost for all sessions for a couple of seconds.
Managing clusters through the transit interfaces—Clustered devices can be managed using transit interfaces. A transit interface cannot be used directly to reach a secondary node.
Configuring Devices for In-Band Management and Administration
The chassis cluster feature available in Junos OS for SRX Series Services Gateways is modeled based on the redundancy features found in Junos OS-based devices. Designed with separate control and data planes, Junos OS-based devices provide redundancy in both planes. The control plane in Junos OS is managed by the Routing Engines, which perform all the routing and forwarding computations apart from other functions. Once the control plane converges, forwarding entries are pushed to all Packet Forwarding Engines in the system. Packet Forwarding Engines then perform route-based lookups to determine the appropriate destination for each packet without any intervention from the Routing Engines.
When enabling a chassis cluster in an SRX Series Services Gateway, the same model device is used to provide control plane redundancy as shown in Figure 4.
Similar to a device with two Routing Engines, the control plane of an SRX Series cluster operates in an active/passive mode with only one node actively managing the control plane at any given time. Because of this, the forwarding plane always directs all traffic sent to the control plane (also referred to as host-inbound traffic) to the cluster’s primary node. This traffic includes (but is not limited to):
Traffic for the routing processes, such as BGP, OSPF, IS-IS, RIP, and PIM traffic
IKE negotiation messages
Traffic directed to management processes, such as SSH, Telnet, SNMP, and NETCONF
Monitoring protocols, such as BFD or RPM
This behavior applies only to host-inbound traffic. Through traffic (that is, traffic forwarded by the cluster, but not destined to any of the cluster’s interfaces) can be processed by either node, based on the cluster’s configuration.
Because the forwarding plane always directs host-inbound traffic to the primary node, the fxp0 interface provides an independent connection to each node, regardless of the status of the control plane. Traffic sent to the fxp0 interface is not processed by the forwarding plane, but is sent to the Junos OS kernel, thus providing a way to connect to the control-plane of a node, even on the secondary node.
This topic explains how to manage a chassis cluster through the primary node without requiring the use of the fxp0 interfaces, that is, in-band management. This is particularly needed for SRX Series branch devices since the typical deployment for these devices is such that there is no management network available to monitor the remote branch office.
Before Junos OS Release 10.1 R2, the management of an SRX Series branch chassis cluster required connectivity to the control plane of both members of the cluster, thereby requiring access to the fxp0 interface of each node. In Junos OS Release 10.1 R2 and later, SRX Series branch devices can be managed remotely using the reth interfaces or the Layer 3 interfaces.
Managing SRX Series Branch Chassis Clusters Through the Primary Node
Accessing the primary node of a cluster is as easy as establishing a connection to any of the node’s interfaces (other than the fxp0 interface). Layer 3 and reth interfaces always direct the traffic to the primary node, whichever node that is. Both deployment scenarios are common and are depicted in Figure 5 and Figure 6.
In both cases, establishing a connection to any of the local addresses connects to the primary node. To be precise, you are connected to the primary node of redundancy group 0. For example, you can connect to the primary node even when the reth interface, a member of the redundancy group 1, is active in a different node (the same applies to Layer 3 interfaces, even if they physically reside in the backup node). You can use SSH, Telnet, SNMP, or the NETCONF XML management protocol to monitor the SRX chassis cluster.
Figure 5 shows an example of an SRX Series branch device being managed over a reth interface. This model can be used for SRX Series high-end devices as well, using Junos OS Release 10.4 or later.
Figure 6 shows physical connections for in-band management using a Layer 3 interface.
If there is a failover, only in-band connections need to be able to reach the new primary node through the reth or Layer 3 interfaces to maintain connectivity between the management station and the cluster.
Table 1 lists the advantages and disadvantages of using different interfaces.
Table 1: Advantages and Disadvantages of Different Interfaces
Reth and Transit Interfaces
Using the fxp0 interface with a primary-only IP address allows access to all routing instances and virtual routers within the system. The fxp0 interface can only be part of the inet.0 routing table. Since the inet.0 routing table is part of the default routing instance, it can be used to access data for all routing instances and virtual routers.
A transit or reth interface has access only to the data of the routing instance or virtual router it belongs to. If it belongs to the default routing instance, it has access to all routing instances.
The fxp0 interface with a primary-only IP address can be used for management of the device even after failover, and we highly recommend this.
Transit interfaces lose connectivity after a failover (or when the device hosting the interface goes down or is disabled), unless they are part of a reth group.
Managing through the fxp0 interface requires two IP addresses, one per node. This also means that a switch needs to be present to connect to the cluster nodes using the fxp0 interface.
The reth interface does not need two IP addresses, and no switch is required to connect to the SRX Series chassis cluster. Transit interfaces on each node, if used for management, need two explicit IP addresses for each interface. But since this is a transit interface, the IP addresses are also used for traffic apart from management as well.
SRX Series branch device clusters with a non-Ethernet link (ADSL, T1\E1) cannot be managed using the fxp0 interface.
SRX Series branch devices with a non-Ethernet link can be managed using a reth or transit interface.
Communicating with a Chassis Cluster Device
Management stations can use the following methods to connect to the SRX Series chassis clusters. This is the same for any Junos OS devices and is not limited to SRX Series chassis clusters. We recommend using a primary-only IP address for any of the following protocols on SRX Series chassis clusters. Reth interface IP addresses can be used to connect to the clusters using any of the following interfaces.
Table 2: Chassis Cluster Communication Methods
SSH or Telnet for CLI Access
This is only recommended for manual configuration and monitoring of a single cluster.
Junos OS XML Management Protocol
This is an XML-based interface that can run over Telnet, SSH, and SSL, and it is a precursor to the NETCONF XML management protocol. It provides access to Junos OS XML APIs for all configuration and operational commands that can be entered using the CLI. We recommend this method for accessing operational information. It can run over a NETCONF XML management protocol session as well.
NETCONF XML Management Protocol
This is the IETF-defined standard XML interface for configuration. We recommend using it to configure the device. This session can also be used to run Junos OS XML Management Protocol remote procedure calls (RPCs).
From an SRX Series chassis cluster point of view, the SNMP system views the two nodes within the clusters as a single system. There is only one SNMP process running on the primary Routing Engine. At initialization time, the protocol primary indicates which SNMP process (snmpd) should be active based on the Routing Engine primary configuration. The passive Routing Engine has no snmpd running. Therefore, only the primary node responds to SNMP queries and sends traps at any point of time. The secondary node can be directly queried, but it has limited MIB support, which is detailed in Retrieving Chassis Inventory and Interfaces. The secondary node does not send SNMP traps. SNMP requests to the secondary node can be sent using the fxp0 interface IP address on the secondary node or the reth interface IP address.
Standard system log messages can be sent to an external syslog server. Note that both the primary and secondary nodes can send syslog messages. We recommend that you configure both the primary and secondary nodes to send syslog messages separately.
Security Log Messages (SPU)
AppTrack, an application tracking tool, provides statistics for analyzing bandwidth usage of your network. When enabled, AppTrack collects byte, packet, and duration statistics for application flows in the specified zone. By default, when each session closes, AppTrack generates a message that provides the byte and packet counts and duration of the session, and sends the message to the host device. AppTrack messages are similar to session log messages and use syslog or structured syslog formats. The message also includes an application field for the session. If AppTrack identifies a custom-defined application and returns an appropriate name, the custom application name is included in the log message. Note that application identification has to be configured for this to occur. See the Junos OS Security Configuration Guide for details on configuring and using application identification and tracking.
All Junos OS devices provide a graphical user interface for configuration and administration. This interface can be used for administering individual devices.