Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Configuring the Disaster Recovery Process Between an Active and a Standby Site

    You configure disaster recovery between an active site and a standby site to ensure geographical redundancy of network management services.

    Before you initiate the disaster recovery process between both sites, perform the following tasks:

    • Ensure that the connectivity requirements as described in the Disaster Recovery Overview topic are met.
    • Check whether identical cluster configurations exist on both sites. We recommend that both clusters have the same number of nodes so that, even in the case of a disaster, the standby site can operate with the same capacity as the active site.
    • Ensure that the same versions of Junos Space Network Management Platform, high-level Junos Space applications, and device adapters are installed at both sites.
    • Shut down the disaster recovery process configured on Junos Space Network Management Platform Release 14.1R3 and earlier before upgrading to Junos Space Network Management Platform Release 15.2R1 and configuring the new disaster recovery process. For more information, see Stopping the Disaster Recovery Process on Junos Space Network Management Platform Release 14.1R3 and Earlier.

      You cannot configure the new disaster recovery process if you do not stop the disaster recovery you set up on 14.1R3 and earlier releases. You do not need to perform this step on a clean installation of Junos Space Network Management Platform Release 15.2R1.

    • Ensure that the same SMTP server configuration exists on both sites to receive e-mail alerts related to the disaster recovery process. You can add SMTP servers from the SMTP Servers task group in the Administration workspace. For more information about adding SMTP servers, see Adding an SMTP Server in the Junos Space Network Management Platform Workspaces Feature Guide.
    • Copy a file with the list of arbitrator devices (one IP address per row) in the CSV format or the custom failure-detection scripts on the VIP node at the active site. You can refer to the sample files at /var/cache/jmp-geo/doc/samples/.
    • Decide on the values for the following parameters depending on your network connectivity and disaster recovery requirements:
      • VIP address and password of both the active and standby sites
      • Backup, restoration, and Secure Copy Protocol (SCP) synchronization settings
      • Heartbeat time intervals
      • E-mail address of the administrator and the dampening interval in seconds to avoid reporting the same errors to avoid an e-mail flood
      • Failure-detection settings such as the failover threshold and the time during which the standby site stays standby if the arbiter devices are unreachable

    The following sections explain how to configure disaster recovery at the active and standby sites and initiate the disaster recovery between both sites.

    Configuring Disaster Recovery at the Active Site

    You use the jmp-dr init –a command to configure disaster recovery at the active site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file.

    To configure disaster recovery at the active site:

    1. Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.

      The Junos Space Settings Menu is displayed.

    2. Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

      The following is a sample output from a virtual appliance:

      admin@10.206.41.183's password:
      Last login: Mon Aug 17 06:17:58 2015 from 10.206.41.42
      
      Welcome to the Junos Space network settings utility.
      
      Initializing, please wait
      
      
      Junos Space Settings Menu
      
      1> Change Password
      2> Change Network Settings
      3> Change Time Options
      4> Retrieve Logs
      5> Security
      6> Expand VM Drive Size
      7> (Debug) run shell
      
      A> Apply changes
      Q> Quit
      R> Redraw Menu
      
      Choice [1-7,AQR]: 7

      You are prompted to enter the administrator password.

    3. Enter the administrator password.
    4. Enter jmp-dr init –a at the shell prompt.

      The values you need to input to configure disaster recovery at the active site are displayed.

      The Load Balancers part of the disaster recovery configuration file is displayed.

    5. Enter the values for the parameters displayed:
      1. Enter the VIP address of the standby site and press Enter.
      2. Enter the administrator passwords of the load-balancer nodes at the standby site and press Enter.

        You can enter multiple passwords separated with commas.

        If multiple nodes use a common password, you need to enter the password only once.

      3. Enter the timeout value to detect a failure in transferring files through SCP from the active site to the standby site, in seconds, and press Enter.

        The minimum and default value is 120.

      4. Enter the maximum number of backups to retain at the active site and press Enter.

        The minimum and default value is 3.

      5. Enter the times of the day to back up files (in hours) at the active site, separated with commas, and press Enter.

        You can enter any value from 0 through 23. You can also enter * to back up files every hour.

      6. Enter the days of the week to back up files at the active site, separated with commas, and press Enter.

        You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.

      7. Enter the times of the day to copy files (in hours) from the active site to the standby site, separated with commas, and press Enter.

        You can enter any value from 0 through 23. You can also enter * to poll files every hour.

      8. Enter the days of the week to copy files from the active site to the standby site, separated with commas, and press Enter.

        You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to poll files every day.

      The following is a sample output:

      #########################
      #
      # Load Balancers
      #
      #########################
      
      What's the vip for load balancers at the standby site? 10.206.41.225
      What are the unique admin passwords for load balancer nodes at the standby site (separated by comma, no space)? $ABC123
      What's the scp timeout value (seconds)? 120
      
      # backup for data in file system instead of DB
      
      What's the max number of backup files to keep? 3
      What are the times of the day to run file backup (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20,21,22,23
      What are the days of the week to run file backup (0-6)? 0,1,2,3,4,5,6
      
      # restore for data in file system instead of DB
      
      What are the times of the day to poll files from the active site (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20,21,22,23
      What are the days of the week to poll files from the active site (0-6)? 0,1,2,3,4,5,6
      
      

      When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

    6. Enter values for the parameters displayed:
      1. Enter the number of times the active site should send heartbeat messages to the standby site through ping after a heartbeat message times out and press Enter.

        The minimum and default value is 4.

      2. Enter the timeout value of each heartbeat message, in seconds, and press Enter.

        The minimum and default value is 5.

      3. Enter the time interval between two consecutive heartbeat messages to the standby site, in seconds, and press Enter.

        The minimum and default value is 30.

      4. Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
      5. Enter the time interval during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.

        The default value is 3,600. The minimum value is 300.

      6. Specify the failure-detection mechanism.

        If you intend to use a custom failure-detection script:

        • Enter Yes in the failureDetection section and press Enter.

        If you intend to use the device arbitration algorithm:

        1. Enter No in the failureDetection section and press Enter.
        2. Enter the threshold percentage to trigger a failover to the standby site by using the device arbitration algorithm and press Enter.

          You can enter any value from 0 to 1. The default value is 0.5.

      7. Enter the path of the file containing the arbiter devices or the custom failure-detection scripts and press Enter.

      The following is a sample output:

      #########################
      #
      # DR Watchdog
      #
      #########################
      
      
      # heartbeat
      
      What's the number of times to retry heartbeat message? 4
      What's the timeout of each heartbeat message (seconds)? 5
      What's the heartbeat message interval between sites (seconds)? 30
      
      # notification
      
      What's the contact email address of service issues? user1@example.com
      What's the dampening interval between emails of affected services (seconds)? 300
      
      # failureDetection
      
      Do you want to use custom failure detection? No
      What's the threshold percentage to trigger failover? 0.5
      What's the arbiters list file (note: please refer to example in /var/cache/jmp-geo/doc/samples/arbiters.list)? /home/admin/user1
      Check status of DR remote site: up
      Prepare /var/cache/jmp-geo/incoming                                                                   [ OK ]
      Configure contact email                                                                               [ OK ]
      Modify firewall for DR remote IPs                                                                     [ OK ]
      Configure NTP                                                                                         [ OK ]
      Configure MySQL database                                                                              [ OK ]
      Configure PostgreSQL database                                                                         [ OK ]
      Copy files to DR slave                                                                                [ OK ]
      Command completed.
      

    When you have entered values for all parameters, disaster recovery is initialized at the active site.

    Configuring Disaster Recovery at the Standby Site

    You use the jmp-dr init –s command to configure disaster recovery at the standby site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file. By default, the standby site uses the failure-detection mechanism you configured at the active site, values you entered for file backup and restoration, heartbeat, and notifications if the standby site becomes an active site.

    To configure disaster recovery at the standby site:

    1. Log in to the CLI of the Junos Space node at the standby site on which the VIP or the eth0:0 interface is configured.

      The Junos Space Settings Menu is displayed.

    2. Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

      You are prompted to enter the administrator password.

    3. Enter the administrator password.
    4. Enter jmp-dr init –s at the shell prompt.

      The values you need to input to configure disaster recovery at the standby site are displayed.

      The Load Balancers part of the disaster recovery configuration file is displayed.

    5. Enter the values for the parameters displayed:
      1. Enter the VIP address of the active site and press Enter.
      2. Enter the administrator passwords of the load-balancer nodes at the active site and press Enter.

        You can enter multiple passwords separated with commas.

        If multiple nodes use a common password, you need to enter the password only once.

      3. Enter the timeout value to detect a failure in transferring files through SCP from the standby site to the active site, in seconds, and press Enter.

        The minimum and default value is 120.

      4. Enter the maximum number of backups to retain at the standby site and press Enter.

        The minimum and default value is 3.

      5. Enter the times of the day to back up files (in hours) at the standby site, separated with commas, and press Enter.

        You can enter any value from 0 through 23. You can also enter * to back up files every hour.

      6. Enter the days of the week to back up files at the standby site, separated with commas, and press Enter.

        You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.

      7. Enter the times of the day to copy files (in hours) from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.

        You can enter any value from 0 through 23. You can also enter * to restore files every hour.

      8. Enter the days of the week to copy files from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.

        You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to restore files every day.

      The following is a sample output:

      #########################
      #
      # Load Balancers
      #
      #########################
      
      What's the vip for load balancers at the active site? 10.206.41.220
      What are the unique admin passwords for load balancer nodes at the active site (separated by comma, no space)? $ABC123
      What's the scp timeout value (seconds)? 120
      
      # backup for data in file system instead of DB
      
      What's the max number of backup files to keep? 3
      What are the times of the day to run file backup (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20,21,22,23
      What are the days of the week to run file backup (0-6)? 0,1,2,3,4,5,6
      
      # restore for data in file system instead of DB
      
      What are the times of the day to poll files from the active site (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20,21,22,23
      What are the days of the week to poll files from the active site (0-6)? 0,1,2,3,4,5,6
      

      When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

    6. Enter the values for the parameters displayed.
      1. Enter the number of times the standby site should send heartbeat messages to the active site through ping after a heartbeat message times out and press Enter.

        The minimum and default value is 4.

      2. Enter the timeout value for each heartbeat message, in seconds, and press Enter.

        The minimum and default value is 5.

      3. Enter the time interval between two consecutive heartbeat messages to the active site, in seconds, and press Enter.

        The minimum and default value is 30.

      4. Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
      5. Enter the time during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.

        The default value is 3,600. The minimum value is 300.

      The following is a sample output:

      #########################
      #
      # DR Watchdog
      #
      #########################
      
      
      # heartbeat
      
      What's the number of times to retry heartbeat message? 4
      What's the timeout of each heartbeat message (seconds)? 5
      What's the heartbeat message interval between sites (seconds)? 30
      
      # notification
      
      What's the contact email address of service issues? user1@example.com
      What's the dampening interval between emails of affected services (seconds)? 300
      Check status of DR remote site: up
      Load /var/cache/jmp-geo/incoming/init.properties                                                      [ OK ]
      Configure contact email                                                                               [ OK ]
      Modify firewall for DR remote IPs                                                                     [ OK ]
      Configure NTP                                                                                         [ OK ]
      Sync jmp-geo group                                                                                    [ OK ]
      Configure MySQL database                                                                              [ OK ]
      Configure PostgreSQL database                                                                         [ OK ]
      Command completed.
      

    When you have entered values for all parameters, disaster recovery is initialized at the standby site.

    Starting the Disaster Recovery Process

    You use the jmp-dr start command to start the disaster recovery process at both sites. You can also use the jmp-dr start-a command to start the disaster recovery process on the active site and the jmp-dr start-s command to start the disaster recovery process on the standby site.

    To start the disaster recovery process:

    1. Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.

      The Junos Space Settings Menu is displayed.

    2. Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

      You are prompted to enter the administrator password.

    3. Enter the administrator password.
    4. Enter jmp-dr start at the shell prompt.

      The disaster recovery process is initiated on both sites.

      The following is a sample output at the active site:

      [user1@host]# jmp-dr start
      Stop dr-watchdog if it's running                                                                      [ OK ]
      Check status of DR remote site: up
      Check current DR role: active
      
      INFO: => start DR at current site: active 
      
      Add device management IPs of DR remote site to up devices                                             [ OK ]
      Setup MySQL replication: master-master                                                                [ OK ]
      Start MySQL dump                                                                                      [ OK ]
      Setup PostgreSQL replication                                                                          [ OK ]
      Start file & RRD replication                                                                          [ OK ]
      Open firewall for device traffic                                                                      [ OK ]
      Start services(jboss,jboss-dc,etc.)                                                                   [ OK ]
      Start dr-watchdog                                                                                     [ OK ]
      Copy files to DR slave site                                                                           [ OK ]
      Update DR role of current site: active                                                                [ OK ]
      
      INFO: => start DR at DR remote site: standby 
      
      Stop dr-watchdog if it's running                                                                      [ OK ]
      Check status of DR remote site: up
      Check current DR role: standby
      Load /var/cache/jmp-geo/incoming/start.properties                                                     [ OK ]
      Stop services(jboss,jboss-dc,etc.)                                                                    [ OK ]
      Block firewall for device traffic                                                                     [ OK ]
      Reset MySQL init script and stop replication                                                          [ OK ]
      Scp backup file from peer site: /var/cache/jmp-geo/data/db.gz                                         [ OK ]
      Start MySQL restore                                                                                   [ OK ]
      Setup MySQL replication and start replication                                                         [ OK ]
      Setup PostgreSQL replication                                                                          [ OK ]
      Start files & RRD replication                                                                         [ OK ]
      Start dr-watchdog                                                                                     [ OK ]
      Clean up /var/cache/jmp-geo/incoming                                                                  [ OK ]
      Update DR role of current site: standby                                                               [ OK ]
      Command completed.
      Command completed.
      

    The disaster recovery process is initialized on the active site and the standby site.

    Verifying the Status of the Disaster Recovery Process

    We recommend that you execute the jmp-dr health command to verify the status (overall health) of the disaster recovery process at both the active and standby sites when you start the disaster recovery process on both sites. For more information about executing the jmp-dr health command, see Checking the Status of the Disaster Recovery Configuration.

    Modified: 2016-06-22