Configuring the Disaster Recovery Process Between an Active and a Standby Site

You configure disaster recovery between an active site and a standby site to ensure geographical redundancy of network management services.

Before you initiate the disaster recovery process between both sites, perform the following tasks:

Ensure that the connectivity requirements as described in the Disaster Recovery Overview topic are met.
Check whether identical cluster configurations exist on both sites. We recommend that both clusters have the same number of nodes so that, even in the case of a disaster, the standby site can operate with the same capacity as the active site.
Ensure that the same versions of Junos Space Network Management Platform, high-level Junos Space applications, and device adapters are installed at both sites.
Shut down the disaster recovery process configured on Junos Space Network Management Platform Release 14.1R3 and earlier before upgrading to Junos Space Network Management Platform Release 15.2R1 and configuring the new disaster recovery process. For more information, see Stopping the Disaster Recovery Process on Junos Space Network Management Platform Release 14.1R3 and Earlier.

You cannot configure the new disaster recovery process if you do not stop the disaster recovery you set up on 14.1R3 and earlier releases. You do not need to perform this step on a clean installation of Junos Space Network Management Platform Release 15.2R1.
Ensure that the same SMTP server configuration exists on both sites to receive e-mail alerts related to the disaster recovery process. You can add SMTP servers from the SMTP Servers task group in the Administration workspace. For more information about adding SMTP servers, see Adding an SMTP Server in the Junos Space Network Management Platform Workspaces Feature Guide.
Copy a file with the list of arbitrator devices (one IP address per row) in the CSV format or the custom failure-detection scripts on the VIP node at the active site. You can refer to the sample files at /var/cache/jmp-geo/doc/samples/.
Decide on the values for the following parameters depending on your network connectivity and disaster recovery requirements:
- VIP address and password of both the active and standby sites
- Backup, restoration, and Secure Copy Protocol (SCP) synchronization settings
- Heartbeat time intervals
- E-mail address of the administrator and the dampening interval in seconds to avoid reporting the same errors to avoid an e-mail flood
- Failure-detection settings such as the failover threshold and the time during which the standby site stays standby if the arbiter devices are unreachable

The following sections explain how to configure disaster recovery at the active and standby sites and initiate the disaster recovery between both sites.

Configuring Disaster Recovery at the Active Site

You use the jmp-dr init –a command to configure disaster recovery at the active site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file.

To configure disaster recovery at the active site:

Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.

Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

The following is a sample output from a virtual appliance:

You are prompted to enter the administrator password.

Enter the administrator password.
Enter jmp-dr init –a at the shell prompt.
The values you need to input to configure disaster recovery at the active site are displayed.

The Load Balancers part of the disaster recovery configuration file is displayed.
Enter the values for the parameters displayed:
1. Enter the VIP address of the standby site and press Enter.
2. Enter the administrator passwords of the load-balancer nodes at the standby site and press Enter.
  You can enter multiple passwords separated with commas.
  
  If multiple nodes use a common password, you need to enter the password only once.
3. Enter the timeout value to detect a failure in transferring files through SCP from the active site to the standby site, in seconds, and press Enter.
  The minimum and default value is 120.
4. Enter the maximum number of backups to retain at the active site and press Enter.
  The minimum and default value is 3.
5. Enter the times of the day to back up files (in hours) at the active site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to back up files every hour.
6. Enter the days of the week to back up files at the active site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.
7. Enter the times of the day to copy files (in hours) from the active site to the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to poll files every hour.
8. Enter the days of the week to copy files from the active site to the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to poll files every day.
The following is a sample output:
When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

Enter values for the parameters displayed:

Enter the number of times the active site should send heartbeat messages to the standby site through ping after a heartbeat message times out and press Enter.
The minimum and default value is 4.
Enter the timeout value of each heartbeat message, in seconds, and press Enter.
The minimum and default value is 5.
Enter the time interval between two consecutive heartbeat messages to the standby site, in seconds, and press Enter.
The minimum and default value is 30.
Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
Enter the time interval during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.
The default value is 3,600. The minimum value is 300.
Specify the failure-detection mechanism.
If you intend to use a custom failure-detection script:
- Enter Yes in the failureDetection section and press Enter.
If you intend to use the device arbitration algorithm:
1. Enter No in the failureDetection section and press Enter.
2. Enter the threshold percentage to trigger a failover to the standby site by using the device arbitration algorithm and press Enter.
  
  You can enter any value from 0 to 1. The default value is 0.5.
Enter the path of the file containing the arbiter devices or the custom failure-detection scripts and press Enter.

The following is a sample output:

When you have entered values for all parameters, disaster recovery is initialized at the active site.

Configuring Disaster Recovery at the Standby Site

You use the jmp-dr init –s command to configure disaster recovery at the standby site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file. By default, the standby site uses the failure-detection mechanism you configured at the active site, values you entered for file backup and restoration, heartbeat, and notifications if the standby site becomes an active site.

To configure disaster recovery at the standby site:

Log in to the CLI of the Junos Space node at the standby site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.
Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.
You are prompted to enter the administrator password.
Enter the administrator password.
Enter jmp-dr init –s at the shell prompt.
The values you need to input to configure disaster recovery at the standby site are displayed.

The Load Balancers part of the disaster recovery configuration file is displayed.
The script asks you if you have re-initialised the DR active site, that is run jmp-dr init -a --skip-user-config at the DR active site. Select Yes or No accordingly.
Enter the values for the parameters displayed:
1. Enter the VIP address of the active site and press Enter.
2. Enter the administrator passwords of the load-balancer nodes at the active site and press Enter.
  You can enter multiple passwords separated with commas.
  
  If multiple nodes use a common password, you need to enter the password only once.
3. Enter the timeout value to detect a failure in transferring files through SCP from the standby site to the active site, in seconds, and press Enter.
  The minimum and default value is 120.
4. Enter the maximum number of backups to retain at the standby site and press Enter.
  The minimum and default value is 3.
5. Enter the times of the day to back up files (in hours) at the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to back up files every hour.
6. Enter the days of the week to back up files at the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.
7. Enter the times of the day to copy files (in hours) from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to restore files every hour.
8. Enter the days of the week to copy files from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to restore files every day.
The following is a sample output:
When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

Enter the values for the parameters displayed.

Enter the number of times the standby site should send heartbeat messages to the active site through ping after a heartbeat message times out and press Enter.
The minimum and default value is 4.
Enter the timeout value for each heartbeat message, in seconds, and press Enter.
The minimum and default value is 5.
Enter the time interval between two consecutive heartbeat messages to the active site, in seconds, and press Enter.
The minimum and default value is 30.
Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
Enter the time during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.
The default value is 3,600. The minimum value is 300.

The following is a sample output:

When you have entered values for all parameters, disaster recovery is initialized at the standby site.

Starting the Disaster Recovery Process

You use the jmp-dr start command to start the disaster recovery process at both sites. You can also use the jmp-dr start-a command to start the disaster recovery process on the active site and the jmp-dr start-s command to start the disaster recovery process on the standby site.

To start the disaster recovery process:

Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.
Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.
You are prompted to enter the administrator password.
Enter the administrator password.

Enter jmp-dr start at the shell prompt.

The disaster recovery process is initiated on both sites.

The following is a sample output at the active site:

The disaster recovery process is initialized on the active site and the standby site.

Verifying the Status of the Disaster Recovery Process

We recommend that you execute the jmp-dr health command to verify the status (overall health) of the disaster recovery process at both the active and standby sites when you start the disaster recovery process on both sites. For more information about executing the jmp-dr health command, see Checking the Status of the Disaster Recovery Configuration.

ON THIS PAGE

Configuring the Disaster Recovery Process Between an Active and a Standby Site

Configuring Disaster Recovery at the Active Site

Configuring Disaster Recovery at the Standby Site

Starting the Disaster Recovery Process

Verifying the Status of the Disaster Recovery Process

Related Documentation