Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 

SSR Cluster Concepts and Terminology

 

A Session State Register cluster has both a physical and logical organization. The physical elements are servers. The logical elements are nodes. The two terms are not interchangeable.

Session State Register Servers

Session State Register Servers

The Session State Register has requirements for the entire cluster and all servers that participate in the cluster, over and above the requirements for standalone SBR Carrier servers. So a SSR cluster does not have any single point of failure, each server in a cluster must have its own memory and disks. We do not recommend or support virtual servers, network shares, network file systems, and SANs.

All servers in the cluster require at least two physical Ethernet ports that provide the same throughput. Multipathing the NICs to a single IP address is required. Session State Register Cluster can work over a 100Base-T network but we recommend 1000Base-T (gigabit Ethernet).

All data servers must have equal processor power, memory space, and available bandwidth because they are tightly coupled and share data. If the overall throughput of the data servers varies from machine to machine, performance degrades. SBR Carrier servers and management servers’ configuration may vary from machine to machine, so long as the basic standalone requirements are met.

Session State Register Nodes

Session State Register Nodes

Four types of nodes can be included in a cluster, each with a specific role within the cluster:

  • An SBR node, also known as an (s) node, hosts the SBR Carrier RADIUS process software component, any optional modules, and all related processes that read and write data into the SSR database. This type of node accesses and manipulates the cluster’s shared data that is hosted by the data nodes.

  • An SBR/management node, also known as an (sm) node, hosts a combination of a SBR node and a management node. The SBR node software component runs the SBR Carrier RADIUS process and the management node software component runs the management node SSR process.

  • A management node, also known as an (m) node, controls itself and all data nodes in the cluster. It provides configuration data, starts and stops nodes, can back up the database and perform other database operations. It also manages a database process that supports the SSR storage engine. Cluster configuration data is located in an identical config.ini file on each of the cluster’s management nodes.

  • A data node, also known as a (d) node, runs the ndbmtd data process. The ndbmtdd process cooperatively manages, replicates, and stores data in the SSR storage engine with other data nodes. Each data node has its own memory and permanent storage. Each one maintains both a portion of the working copy of the SSR database and a portion of one or more replicas of the database.

All the data nodes in a cluster run a special process called the shared memory engine that manages the working copy of the SSR database. The management nodes coordinate the service among the participating data nodes. The shared storage engine and the SSR database replaces the on-board database used by standalone Steel-Belted Radius Carrier servers. The shared memory engine ensures that the database is updated by a synchronous replication mechanism that keeps cluster nodes synchronized: a transaction is not committed until all cluster nodes are updated.

SSR Data Entities

SSR Data Entities

Each data node participates in a node group of two data nodes. A Starter Kit cluster has a single node group with two members; a Starter Kit with an Expansion Kit has two node groups, each with two data nodes. Each node group stores different partitions and replicas.

  • A partition is a portion of all the data stored by the cluster. There are as many cluster partitions as node groups in the cluster. Each node group keeps at least one copy of any partitions assigned to it (that is, at least one replica) available to the cluster.

  • A replica is a copy of a partition. Each data node in a node group stores a replica of a partition. A replica belongs entirely to a single data node; a node can (and usually does) store several replicas because maintaining two replicas is the fixed setting for SSR.

Figure 1 shows the data components of a data cluster with four data nodes arranged in two node groups of two nodes each. Nodes 1 and 2 belong to Node Group 1. Nodes 3 and 4 belong to Node Group 2.

  • Because there are four data nodes, there are four partitions.

  • The number of replicas is two, to create two copies of each primary partition.

So long as both nodes in one node group are operating, or one node in each node group is operating, the cluster remains viable.

Figure 1: SSR with Four Data Nodes in Two Groups
SSR with Four Data Nodes in Two Groups

The data stored by the cluster in Figure 1 is divided into four partitions: 0, 1, 2, and 3. Multiple copies of each partition are stored within the same node group. Partitions are stored on alternate node groups:

  • Partition 0 is stored on Node Group 1. A primary replica is stored on Data Node 1 and a backup replica is stored on Data Node 2.

  • Partition 1 is stored on the other node group, Node Group 2. The primary replica is on Data Node 3 and its backup replica is on Data Node 4.

  • Partition 2 is stored on Node Group 1. The placement of its two replicas is reversed from that of Partition 0; the primary replica is stored on Data Node 2 and the backup on Data Node 1.

  • Partition 3 is stored on Node Group 2, and the placement of its two replicas are reversed from those of partition 1: the primary replica is on Data Node 4 and the backup on Data Node 3.

Tip

Primary and replica are used in another context in the Steel-Belted Radius Carrier environment and documentation, which can cause some confusion. These terms mean something specific in the context of Session State Register, but they are also used when talking about centralized configuration management, or CCM.

CCM is a feature that coordinates Steel-Belted Radius Carrier server settings between a primary RADIUS server and one or more replica RADIUS servers. It copies critical configuration files from the primary to the replicas, so it keeps multiple SBR Carrier servers operating the same way.

CCM is a separate tool and process that are not tied or linked to SSR, but CCM is often used in SSR environments to keep the SBR Carrier nodes operating identically.

Cluster Configurations

Cluster Configurations

For the highest level of redundancy, we recommend that each node in a cluster run on its own server. In many locations and for many installations, that might not be practical, so you can run a SBR and a management node together on the same machine—in fact, that is the default configuration for the SSR Starter Kit cluster. However, neither a management node nor an SBR node can run on the same machine as a data node. Separation is required so that management arbitration services continue if one of the data node servers fails.

Using these separation guidelines, the recommended minimum size of a Session State Register cluster is four physical machines: two machines that each run a SBR/management node combination, and two machines to host the data nodes. This configuration supports all licenses and nodes included in the Session State Register Cluster Starter Kit and is shown in Figure 2:

Figure 2: Basic Session State Register Starter Kit Cluster
Basic Session State Register Starter
Kit Cluster

Session State Register Scaling

Session State Register Scaling

You scale a Session State Register cluster when you add a separately licensed SSR Expansion Kit to a Starter Kit, a third management node, or additional SBR Carrier front end systems.

Adding a Data Node Expansion Kit

Adding a Data Node Expansion Kit

An Expansion Kit adds two data nodes to increase the number of data nodes in a cluster to four. The additional nodes form a second node group (as shown in Figure 3) that provides more working memory for the SSR shared database. With the Expansion Kit in place, each node group manages a partition of the primary database and replicas. The data in each partition is synchronously replicated between the group’s data nodes, so if one data node fails, the remaining node can still access all the data. This configuration also provides very quick failover times if a node fails.

Figure 3: SSR Cluster with an Expansion Kit Setup to Create Two-Node Groups
SSR Cluster with an Expansion Kit Setup
to Create Two-Node Groups

Adding a Third Management Node

Adding a Third Management Node

A Management Node Expansion Kit provides software and a license for a third management node. If it is set up on a separate host instead of alongside a SBR Carrier node on a shared server, this also increases the resiliency of the cluster by providing an additional arbiter in case of a node failure.

Adding More SBR Carrier Front End Servers

Adding More SBR Carrier Front End Servers

The service capacity of the SBR Carrier environment grows when you add additional stateless SBR servers to the front end. Adding additional SBR Carrier servers increases the resiliency of the cluster and the speed of processing a particular transaction because wait time is reduced. Up to 20 Steel-Belted Radius Carrier nodes can be supported by a data cluster.

The SBR Carrier servers do not require identical configurations; they can be configured with different optional modules or communications interfaces. Each one requires a separate SBR Carrier license, but they all share the Session State Register Starter Kit license.

We recommend installing a load balancer in front of the SBR Carrier servers to evenly distribute the RADIUS load between front end SBR Carrier nodes. Regular server-based load balancing works if the front ends only processes RADIUS transactions, Use a RADIUS-aware load balancer if the front ends perform multi-round authentication.

Cluster Network Requirements

Cluster Network Requirements

A redundant cluster requires a redundant network. At the machine level, we require dual interface cards in each machine and multipathing.

We recommend that the network be a dedicated subnet with dual switches. This fully duplicates the network and each machine in the cluster has at least two routes to all other machines, as shown in Figure 4.

Figure 4: Starter Kit SSR Cluster with Redundant Network
Starter Kit SSR Cluster with Redundant
Network

The SSR database schema uses primary key lookups as often as possible during transaction processing, so the database cluster performance scales almost linearly based on the number of data nodes in the cluster.

Do not configure the subnet to be shared beyond the cluster machines because communications between nodes are not encrypted or shielded in any way. The only means of protecting transmissions within a cluster is to run your cluster on a protected network; do not interpose firewalls between any of the nodes.

Running the cluster on a private or protected network also increases efficiency because the cluster has exclusive use of all bandwidth between cluster hosts. This protects the cluster nodes from interference caused by transmissions between other devices on the network.

We strongly recommend Gigabit Ethernet as the network type; 100Base-T is the minimum supported speed. Network latency can severely degrade performance, so we also recommend that all servers be close enough together that latency is always much less than 10 ms.

Table 6: Latency between Servers and Its Effect on Performance

Latency Times

Performance Degradation

0 ms latency (LAN)

Baseline performance as designed.

10 ms latency

Up to 40% performance loss

20 ms latency

Up to 60% performance loss

More than 20 ms latency

Not supported.