Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All
     

    Related Documentation

     

    Self-Monitoring Configuration Variables

    Monitoring and alerting on key systems when there is a failure helps you to better understand what is occurring and why. This also helps customer support to troubleshoot issues. The following categories are checked as part of self-monitoring:

    Note: Self-monitoring alerts are only sent to mws.log if the alert service is enabled and there is at least one contact configured. If the alert service is not enabled, self-monitoring alerts are not sent. See Alert Service for instructions to turn this service on.

    • disk–Stores the percentage of free disk space per mount point. It runs every 5 minutes.
    • load–Stores the five minute load average. It runs every 5 minutes.
    • swap–Stores swapcached, swaptotal, and swapfree in KB. It also stores percentused as a percentage. It runs every 5 minutes.
    • raid–Stores the raid status as running' or failed. It only runs on hardware appliances, and it runs every 10 minutes.
    • sessions–Stores number of sessions in last 24 hours as last24hours, largest session group in last 24 hours as maxgroupsize, and session with the most request in last 24 hours as maxrequests. It runs once an hour.
    • incidents–Stores incidents with the percentage of sessions in the last 24 hours as the incident name. It runs once an hour.
    • logsize–Stores the log file size as log filename. It runs every 5 minutes.
    • logging.resources.TERM–Stores the number of times the regex is found since the last time it ran. It runs every 5 minutes.
    • appliance–Stores the status of each script returned by /etc/init.d/mykonos-appliance status. This will also attempt to restart failed services up to the max_restart config variable. This script runs every 1 minute.
    • services–Stores the status of nginx, pyro, postgres. This will also attempt to restart any of these if they fail up to the max_restart config variable. These scripts run every minute.
    • ha–Stores the ring status and sync status of the disks in HA mode. This will also store the current master in HA mode.
    • interconnect –Stores the latency and packet loss of the interconnect in HA mode.

    Configuration variables for monitored categories are as follows:

    • Syntax: system.monitor.alert.interval [INTEGER]

      Default Value: 7200 - Sets the re-alert interval for non-acknowledged alerts in seconds.

    • Syntax: system.monitor.alert.enabled [true|false]

      Enable or disable all alerts.

    • Syntax: system.monitor.CATEGORY.collect_stats [true|false]

      Enables historic stats to be collected for that category.

      Setting

      Default

      system.monitor.appliance.collect_stats

      true

      system.monitor.disk.collect_stats

      true

      system.monitor.load.collect_stats

      true

      system.monitor.services.collect_stats

      true

      system.monitor.logging.resources.clientaborted.collect_stats

      true

      system.monitor.logging.resources.outofmemory.collect_stats

      true

      system.monitor.ha.oos.collect_stats

      true

      system.monitor.ha.ring.interconnect.collect_stats

      true

      system.monitor.ha.ring.management.collect_stats

      true

      system.monitor.interconnect.latency.collect_stats

      true

      system.monitor.interconnect.loss.collect_stats

      true

    • Syntax: system.monitor.CATEGORY.threshold [THRESHOLD]

      Sets the threshold for the alert to trigger. The threshold is based on the type of check. For status type checks, there are two options [stopped|failed]. For other checks it should be a numeric value.

      Setting

      Default

      system.monitor.appliance.threshold

      failed

      system.monitor.disk.threshold

      85

      system.monitor.incidents.threshold

      70

      system.monitor.load.threshold

      10

      system.monitor.sessiongroup.threshold

      2000

      system.monitor.sessions.threshold

      10

      system.monitor.swap.percentused.threshold

      30

      system.monitor.services.threshold

      failed

      system.monitor.ha.oos.threshold

      10240

      system.monitor.ha.ring.interconnect.threshold

      failed

      system.monitor.ha.ring.management.threshold

      failed

      system.monitor.interconnect.latency.threshold

      10

      system.monitor.interconnect.loss.threshold

      25

      system.monitor.raid.0.threshold

      failed

    • Syntax: system.monitor.CATEGORY.description [STRING]

      Sets the description used in the alerts. If not set, the system will use the CATEGORY name.

      Setting

      Default

      system.monitor.logging.resources.clientaborted.description

      Client aborted Errors

      system.monitor.logging.resources.outofmemory.description

      Out of Memory Errors

      system.monitor.ha.oos.description

      The amount of data out of sync between HA nodes in Kibibytes

      system.monitor.ha.ring.interconnect.description

      The status of the interconnect connection between HA nodes

      system.monitor.ha.ring.management.description

      The status of the management connection between HA nodes

    • Syntax: system.monitor.CATEGORY.alert.severity [1.0|2.0|3.0|4.0]

      Sets the severity for the alert. These are used when determining who to send the alert to. You can set your minimum alert level to 3.0. Then for checks with a severity under 3.0, you would not receive an alert.

      Setting

      Default

      system.monitor.appliance.alert.severity

      4.0

      system.monitor.disk.alert.severity

      3.0

      system.monitor.incidents.alert.severity

      3.0

      system.monitor.load.alert.severity

      2.0

      system.monitor.logging.resources.outofmemory.alert.severity

      2.0

      system.monitor.sessiongroup.alert.severity

      2.0

      system.monitor.sessions.alert.severity

      3.0

      system.monitor.services.alert.severity

      4.0

      system.monitor.ha.oos.alert.severity

      3.0

      system.monitor.ha.ring.interconnect.alert.severity

      4.0

      system.monitor.ha.ring.management.alert.severity

      4.0

      system.monitor.interconnect.latency.alert.severity

      3.0

      system.monitor.interconnect.loss.alert.severity

      4.0

      system.monitor.raid.0.alert.severity

      4.0

    • Syntax: system.monitor.CATEGORY.alert.below_threshold [true|false]

      Flips the check so that its a less than (Threshold < Value) check vs the default of a greater than (Value < Threshold) check.

      Setting

      Default

      system.monitor.sessions.last24hours.alert.below_threshold

      true

    • Syntax: system.monitor.CATEGORY.alert.enabled [true|false]

      Enable or disable the specific alert.

    • Syntax: system.monitor.logging.resources.CATEGORY.filename [PATH TO FILE]

      Sets the path to the filename for errors to be searched for.

      Setting

      Default

      system.monitor.logging.resources.clientaborted.filename

      /var/log/mws/mws.log

      system.monitor.logging.resources.outofmemory.filename

      /var/log/mws/mws.log

    • Syntax: system.monitor.logging.resources.CATEGORY.regex [REGEX]

      Sets the regex for strings to search for in logfile.

      Setting

      Default

      system.monitor.logging.resources.clientaborted.regex

      Client aborted

      system.monitor.logging.resources.outofmemory.regex

      OutOfMemory

    • Syntax: system.monitor.CATEGORY.alert.on_change [true|false]

      If true, threshold is not used, and an alert is sent if the value changes between checks.

      Setting

      Default

      system.monitor.ha.master.alert.on_change

      true

      system.monitor.logging.resources.outofmemory.alert.on_change

      true

    • Syntax: system.monitor.CATEGORY.alert.on_each_change [true|false]

      If true, and on_change is set to true, an alert is sent every time the value changes during checks.

      Setting

      Default

      system.monitor.ha.master.alert.on_each_change

      true

     

    Related Documentation

     

    Published: 2014-06-27