Overview of Starting and Stopping a Session State Register Cluster
Having to stop all nodes in a cluster is uncommon because most system maintenance can be done on one system at a time. Taking the whole cluster offline defeats the intention of the cluster—to avoid downtime. So ensure that taking all systems down at the same time is required before proceeding. Rather than taking down all nodes, determine whether stopping just the SBR processes or the database management processes might be sufficient.
Stopping a server that hosts both a SBR Carrier and a management node creates a double fault, but does not damage the cluster because a fully redundant cluster always has more than one of each type of node. Stopping multiple nodes that provide redundancy to each other causes multiple faults that may damage the cluster and take the entire cluster off-line.
In the SSR environment, each type of node is started in a specific order so that required resources are available when other nodes require them, and stopped. This means that several commands may be executed on servers that host both SBR Carrier and management nodes.
Startup and shutdown commands must be executed by root on each node.
The sbrd script starts and stops processes on Steel-Belted Radius Carrier hosts for all four types of nodes in the cluster. The sbrd script may be in either of two directories on servers, depending on whether they have been configured to automatically start all procedures or not using the autoboot functionality which is configured when running the configure script.
The ./configure script prompts you to enable or disable the autoboot option. If you disable it, you cannot start the SSR process on the node (./sbrd start ssr) from the /etc/init.d/sbrd directory. If the autoboot option is disabled, you must start the SSR process from the /opt/JNPR sbr/radius directory.
All sbrd commands are executed by root. In an SSR environment, the hadm user can execute the script on SSR processes, but expect errors with RADIUS processes that are owned by root.
Running sbrd on Session State Register Nodes
Running sbrd on Session State Register Nodes
This section applies to running sbrd on nodes in a Session State Register cluster.
sbrd status [radius|ssr|GWrelay] sbrd start [radius|ssr|GWrelay] [force] sbrd start ssr --nowait-nodes=node-ids sbrd stop [radius|ssr|GWrelay] [force] sbrd stop [cluster] [force] sbrd restart [radius|ssr|GWrelay] [force] sbrd clean [radius|ssr] [force] sbrd hup [radius|ssr|authGateway [process-name]] sbrd status [radius|ssr|GWrelay] -v [-p <LCI password>]
The start, stop, and restart arguments start, stop, and restart the process. If a subsystem is not specified, the command works only on RADIUS and GWrelay processes because SSR processes normally are not stopped; to stop them, ssr must be invoked. For example: sbrd stop ssr.
Executing stop cluster on a SBR Carrier server stops both SSR and RADIUS processes. Executing stop cluster on a management node also stops the data nodes controlled by the management node.
The clean argument removes lock files that prevent reinitializing the database more than once. You should use this argument only if things go wrong during the initial installation and configuration.
When it is executed on a data node, clean also prepares the node to take part in a new environment; for example, if an expansion kit is added to increase the number of data nodes from two to four.
The radius, ssr, or GWrelay optional argument specifies which process to operate on when executed on a server that hosts more than one node.
radius specifies the local Steel-Belted Radius Carrier processes
ssr specifies data node and management node processes according to the type of node on which it is executed
GWrelay specifies the GWrelay application.
Executing start ssr --nowait-nodes=node-ids starts the cluster without waiting for the full cluster to be initialized. The node-ids variable specifies the comma-separated list of node IDs that are unreachable, for example: sbrd start ssr --nowait-nodes=51,52. You must use this argument only if one half of the cluster has network connectivity, but has lost the ability to communicate with the other half. When the network connectivity between the two halves of the cluster is restored, you can start the remaining nodes with the normal startup scripts.
The status option displays information such as SBR package version, SBR process status, and loaded plug-in information.
The hup option operates as the kill -HUP command does on SBR Carrier nodes, but does not require the process ID. Executing sbrd hup authGateway issues the SIGHUP (1) signal to all the authGateway processes running on SBR Carrier. To issue the SIGHUP (1) signal only to the specific authGateway process, you must execute the hup option with the authGateway process name, for example: sbrd hup authGateway GMT.
The force argument makes sbrd attempt to disregard or overcome any errors that occur when processing the command. Normal behavior without the argument is to halt on errors. For example, sbrd start does not attempt to start software that is already running, but sbrd start force ignores a running process. This may produce unintended results, so use force with great care.
The -v option displays additional information about the RADIUS process along with basic information such as the SBR package version, SBR process status, and SBR process ID. If you have changed the default Lightweight Directory Access Protocol (LDAP) Configuration Interface (LCI) password, you should use the -p option to specify the password. For more information about the RADIUS status information, see Displaying RADIUS Status Information.
In the case of a cluster, stopping the RADIUS server does not cause any SSR processes to be stopped. If you want to stop the SSR processes on a SBR/management type node (for example for scheduled maintenance of the machine) then as root, navigate to the radius/install subdirectory of the directory in which the JNPRsbr package was installed (/opt/JNPRsbr/radius/install by default) and:
./sbrd stop ssr
If you want to stop the entire cluster (not usually intended), then on each and every node execute as root:
./sbrd stop cluster
When you stop a cluster, the system prompts you with the following warning:
WARNING: This function is capable of stopping multiple nodes. Do not use this function if you intend to stop only one node. Do you intend to stop the entire cluster? (y,n): y Are you sure? (y,n): y Really? (y,n): y
This example shows the effect of sbrd stop ssr executed on a cluster management node:
root@wrx07:~> /opt/JNPRsbr/radius/sbrd stop ssr Stopping ssr auxiliary processes Stopping ssr management processes
Connected to Management Server at: 172.28.84.36:5235 Node 1 has shutdown. Disconnecting to allow Management Server to shutdown
This example shows the effect of sbrd start ssr on a management node. Be aware that this does not start the data nodes.
root@wrx07:~> /opt/JNPRsbr/radius/sbrd start ssr Starting ssr management processes bash-3.00#
When sbrd is executed without a <radius|ssr> argument, it runs against all node processes on the server. For example, sbrd start starts both RADIUS and SSR processes for all nodes on a server. For complete details see When and How to Restart Session State Register Nodes, Hosts, and Clusters.
In an SSR environment, because some servers may host both SBR Carrier and management nodes, sbrd may be executed more than once with different arguments.
The clean argument removes lock files that prevent reinitializing the database more than once. You should use this argument only if something goes wrong during the initial installation and configuration, or when adding data nodes.
Starting the Cluster
Starting the Cluster
If all nodes in the cluster are shut down, restarting requires bringing each type of node online in a specific order. If the systems are completely shut down, rebooting the machine restarts the appropriate processes automatically because automatic restart is the default configuration for all types of Session State Register nodes.
If the systems have not been totally shut down and just the SSR processes have been halted, log in as root and execute the start commands in the order described in Proper Order for Starting Nodes in a Cluster, to start each type of node’s processes.
During the cluster startup process, each time a SSR or RADIUS process is started on a node, we recommend that you verify the status of that node before moving on to the next node by executing the sbrd status command:
Log in to the node as hadm or root.
Results similar to this example are displayed:
[ndbd(NDB)] 2 node(s) id=10 @172.28.84.163 (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, Master) id=11 @172.28.84.113 (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s) id=1 @172.28.84.36 (mysql-5.7.25 ndb-7.6.9) id=2 @172.28.84.166 (mysql-5.7.25 ndb-7.6.9)
[mysqld(API)] 4 node(s) id=21 @172.28.84.36 (mysql-5.7.25 ndb-7.6.9) id=22 @172.28.84.166 (mysql-5.7.25 ndb-7.6.9) id=30 @172.28.84.36 (mysql-5.7.25 ndb-7.6.9) id=31 @172.28.84.166 (mysql-5.7.25 ndb-7.6.9)
Examine the line starting with id=, and verify that there are no references to starting, connecting, or not connected. Any of these references indicate the process has either not finished starting, or the node is not connected properly. You may need to execute the sbrd status command more than once because it only shows a snapshot of activity; the display does not refresh automatically. Do not proceed to the next node until you are sure the process has started properly and the node is connected.