Juniper Networks
Log in
|
How to Buy
|
Contact Us
|
United States (Change)
Choose Country
Close

Choose Country

North America

  • United States

Europe

  • Deutschland - Germany
  • España - Spain
  • France
  • Italia - Italy
  • Россия - Russia
  • United Kingdom

Asia Pacific

  • Asean Region (Vietnam, Indonesia, Singapore, Malaysia)
  • Australia
  • 中国 - China
  • India
  • 日本 - Japan
  • 대한민국 - Korea
  • 台灣 - Taiwan
Solutions
Products & Services
Company
Partners
Support
Education
Community
Security Intelligence Center

Technical Documentation

Support
Technical Documentation
Content Explorer New
 
Enterprise MIBs
 
EOL Documentation
 
File Format Help
 
Glossary
 
Portable Libraries
 
 
Home > Support > Technical Documentation > IDP Series > Tuning the Auto-Recovery Bypass Setting
Print
Rate and give feedback:  Feedback Received. Thank You!
Rate and give feedback: 
Close
This document helped resolve my issue.  Yes No

Additional Comments

800 characters remaining

May we contact you if necessary?

Name:  
E-mail: 
Submitting...
 

Related Documentation

  • Viewing Auto-Recovery Logs
  • Tuning the Auto-Recovery Policy Reload Setting
  • Disabling the Auto-Recovery Feature
 

Tuning the Auto-Recovery Bypass Setting

Problem

The auto-recovery feature detects failure of an IDP engine and buffers packets while it shuts down and attempts to restart the IDP engine. Auto-recovery is enabled by default.

In the default implementation, the auto-recovery process stops the NIC bypass watchdog process, thereby temporarily changing state for all enabled forwarding interfaces to either Bypass state or NICs OFF state based on the configuration set with ACM. When the auto-recovery process restarts the IDP engine, it also brings up the NIC interfaces and restarts the NIC bypass watchdog. Link flapping can occur when these processes are repeated frequently in a brief period.

If the default implementation of the auto-recovery feature causes link flapping issues in your network, consider the alternative implementation described here. In the alternative implementation, you configure the auto-recovery feature to try to recover once without entering NIC bypass. In this implementation, the newly started IDP engine forwards traffic. The traffic is forwarded uninspected until the IDP policy is reloaded. A second IDP engine failure within the specified threshold period triggers NIC bypass. Note that the number of failed restarts for this implementation is different from the default auto-recovery behavior, which attempts to restart the IDP engine six times before settling into the NIC bypass state.

With the alternative auto-recovery implementation, there is delay to throughput before the newly started IDP engine is ready to forward traffic. The delay varies (even within different deployments of the same platform). The delay depends on the number of virtual routers enabled, the device configuration, load, and other factors. Table 1 lists delays observed during testing. We recommend you run tests to determine the delay for your devices to understand whether this auto-recovery implementation is right for your network.

Table 1: Throughput Delay Examples

Platform

Virtual Routers

Delay (Seconds)

IDP75

1

20

IDP250

4

15

IDP800

5

19

IDP8200

8

20

Note: If auto-recovery attempts do not resolve the issue, the device enters a Bypass or Nics off state, depending on the setting you configured with ACM. At that point, you must take manuals steps to diagnose and resolve the issue and bring the IDP Series device back online.

Solution

If you decide to use the alternate implementation of auto-recovery, you must configure two values in the user_funcs file.

To enable the alternate implementation of auto-recovery:

  1. Log into the CLI as admin and enter su - to switch to root.
  2. Open the /usr/idp/device/bin/user_funcs file in a text editor, such as vi.
  3. Implementing the alternate auto-recovery method requires you set pktprocess_afterpolicyload=0. If you have not already done so, take the following actions:

    1. Locate the following line:
      export pktprocess_afterpolicyload=1
    2. Change the value to 0 (export pktprocess_afterpolicyload=0). This setting specifies that packet processing resumes as soon as possible and does not wait until the security policy has been loaded.
  4. To enable the alternate auto-recovery implementation, take the following actions:

    1. Locate the line highlighted in the following example:
      # Support to enable/disable the logic of stopping and then later starting
      # nicBypass.sh when autorecovery feature detects idpengine has exited.
      # Setting the variable to 0(default) implies the behavior is unchanged. 
      # Setting it to any non-zero value would enable the new behavior. The 
      # non-zero value also indicates the threshold time(in minutes) within 
      # which two successive autorecovery detections must not happen. If it so 
      # happens, then IDP service would be stopped in the device.
      # Note: This configuration is bound to the configuration optoin
      # pktprocess_afterpolicyload. Hence, this configuration will not work if 
      # the configuration option pktprocess_afterpolicyload is set to a non-zero value.
      
      export autorecovery_bypass=0

      The default autorecovery_bypass=0 indicates the feature is disabled. When disabled, the standard auto-recovery process is followed.

    2. Change the setting to a non-zero value to enable the alternate method. The number you specify is a threshold period in minutes. A second IDP engine failure within threshold minutes of the first triggers NIC bypass. We recommend setting the autorecovery_bypass value to at least 5 (minutes). For example:
      export autorecovery_bypass=5
  5. Save the file and exit the editor.
  6. Restart the IDP engine:
    [root@defaulthost admin]# idp.sh restart

    Restarting the IDP engine can take several moments.

  7. In a test environment, kill the IDP engine process and observe the auto-recovery logs. For example, use the Linux kill -9 idpengine_PID command. Control plane logs related to auto-recovery and bypass events are sent as NSM and syslog events. Debug logs are written to /usr/idp/device/var/sysinfo/logs/idpinit.date.

    The following example shows the debug log messages for IDP75, a single-core platform, when recovery is successful. Logs are similar for all single-core platforms.

    IDP75 Autorecovery Log: Successful Recovery

    Fri Aug 26 06:39:46 PDT 2011:Detected 0 to be terminated
    [06:39:47] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [06:39:49] sc_network_outage_monitor:560: one or more engine is terminated or hung
    Adding recovered engine with pid  19844  into cgroup DP share
    Fri Aug 26 06:40:00 PDT 2011:Indicating signal-jnet to kjnetd
    IDP instance 0 successfully recovered
    Restarting all CP processes
    [06:45:08] ../../idpinit.c:idpinit_signals:84: SIGUSR1 delivered, will start network outage monitoring
    Fri Aug 26 06:45:09 PDT 2011: Done
    

    The following example shows the debug log messages when a second terminated IDP engine instance is detected within the threshold period. The IDP service is stopped, triggering NIC bypass.

    IDP75 Autorecovery Log: Bypass Triggered Upon Second Failure Within Threshold

    Fri Aug 26 06:53:59 PDT 2011:Detected 0 to be terminated
    [06:53:59] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    Adding recovered engine with pid   into cgroup DP share
    Fri Aug 26 06:54:02 PDT 2011:Indicating signal-jnet to kjnetd
    IDP instance 0 successfully recovered
    Restarting all CP processes
    [06:54:03] sc_network_outage_monitor:560: one or more engine is terminated or hung
    [06:55:38] ../../idpinit.c:idpinit_signals:84: SIGUSR1 delivered, will start network outage monitoring
    Fri Aug 26 06:55:38 PDT 2011: Done
    Fri Aug 26 06:55:39 PDT 2011:Detected 0 to be terminated
    [06:55:39] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [06:55:40] Triggering nicBypass since IDP instance 0 restarted again in 140 seconds 
    (less than the configured value of 300 seconds), Stopping IDP service [06:55:40] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring [06:55:43] sc_network_outage_monitor:560: one or more engine is terminated or hung

    The following example shows the debug log messages for IDP800, a dual-core platform, when recovery is successful. Logs are similar for all dual-core platforms.

    IDP800 Autorecovery Log: Successful Recovery

    Thu Aug 25 05:02:04 IST 2011:Detected 0 to be terminated
    [05:02:04] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [05:02:06] sc_network_outage_monitor:560: one or more engine is terminated or hung
    Thu Aug 25 05:02:12 IST 2011:Indicating signal-jnet to kjnetd
    IDP instance 0 successfully recovered
    Restarting all CP processes
    [05:03:07] ../../idpinit.c:idpinit_signals:84: SIGUSR1 delivered, will start network outage monitoring
    Thu Aug 25 05:03:07 IST 2011: Done
    

    The following example shows the debug log messages when a second terminated IDP engine instance is detected within the threshold period. The IDP service is stopped, triggering NIC bypass.

    IDP800 Autorecovery Log: Bypass Triggered Upon Second Failure Within Threshold

    Thu Aug 25 05:13:32 IST 2011:Detected 0 to be terminated
    [05:13:32] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    Thu Aug 25 05:13:35 IST 2011:Indicating signal-jnet to kjnetd
    IDP instance 0 successfully recovered
    Restarting all CP processes
    [05:13:35] sc_network_outage_monitor:560: one or more engine is terminated or hung
    [05:15:02] ../../idpinit.c:idpinit_signals:84: SIGUSR1 delivered, will start network outage monitoring
    Thu Aug 25 05:15:02 IST 2011: Done
    Thu Aug 25 05:15:03 IST 2011:Detected 0 to be terminated
    [05:15:03] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [05:15:03] Triggering nicBypass since IDP instance 0 restarted again in 171 seconds
    (less than the configured value of 300 seconds), Stopping IDP service [05:15:05] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring [05:15:05] sc_network_outage_monitor:560: one or more engine is terminated or hung

    The following example shows the debug log messages for IDP8200, a multi-core platform, when recovery is successful.

    IDP8200 Autorecovery Log: Successful Recovery

    Thu Aug 25 06:01:17 IST 2011:Detected 0 to be terminated
    [06:01:17] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [06:01:18] sc_network_outage_monitor:560: one or more engine is terminated or hung
    Thu Aug 25 06:01:32 IST 2011:Indicating signal-jnet to kjnetd
    mecd: Thu Aug 25 06:01:33 IST 2011: start idpengine_0
    IDP instance 0 successfully recovered
    Restarting all CP processes
    [06:02:43] ../../idpinit.c:idpinit_signals:84: SIGUSR1 delivered, will start network outage monitoring
    Thu Aug 25 06:02:43 IST 2011: Done
    mecd: Thu Aug 25 06:02:43 IST 2011: Received SIGUSR1. Exiting now...
    

    Note the mecd message. The mecd is the multi-engine crash detector particular to IDP8200. The mecd service monitors other idpengine instances during the autorecovery of the first engine in recovery. If the mecd service detects a second idpengine instance crash while the first is still in the process of recovering, the auto-recovery script stops the NIC bypass watchdog in order to trigger NIC bypass. (That event does not appear in the example.)

    The final example shows the IDP8200 debug log messages when idpengine_1 fails twice within the threshold period. The IDP service is stopped, triggering NIC bypass.

    IDP8200 Autorecovery Log: Bypass Triggered Upon Second Failure Within Threshold

    Thu Aug 25 06:56:51 IST 2011:Detected 1 to be terminated
    Thu Aug 25 06:56:54 IST 2011:Indicating signal-jnet to kjnetd
    IDP instance 1 successfully recovered
    Restarting all CP processes
    Thu Aug 25 06:57:35 IST 2011: Done
    Thu Aug 25 06:57:37 IST 2011:Detected 1 to be terminated
    [06:57:37] ../../idpinit.c:idpinit_signals:91: SIGUSR2 delivered, will stop network outage monitoring
    [06:57:37] Triggering nicBypass since IDP instance 1 restarted again in 87 seconds 
    (less than the configured value of 900 seconds), Stopping IDP service [06:57:40] sc_network_outage_monitor:560: one or more engine is terminated or hung
 

Related Documentation

  • Viewing Auto-Recovery Logs
  • Tuning the Auto-Recovery Policy Reload Setting
  • Disabling the Auto-Recovery Feature
 

Published: 2011-09-21

 
  • About Juniper
  • The New Network
  • Investor Relations
  • Press Releases
  • Newsletters
  • Juniper Offices
  • Resources
  • How to Buy
  • Partner Locator
  • Image Library
  • Visio Templates
  • Security Center
  • Community
  • Forums
  • Blogs
  • Junos Central
  • Social Media
  • Support
  • Technical Documentation
  • Knowledge Base (KB)
  • Software Downloads
  • Product Licensing
  • Contact Support
Site Map / RSS Feeds / Careers / Accessibility / Feedback / Privacy & Policy / Legal Notices
Copyright© 1999-2012 Juniper Networks, Inc. All rights reserved.
Help
|
My Account
|
Log Out