Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation  Back up to About Overview 
ContentIndex
  
[+] Expand All
[-] Collapse All

No index entries found.

Related Documentation

    Detailed Procedures

    Threshold Editor

    Threshold alarms can be used to monitor the network against any number of user defined SLAs or other production and performance requirements. When these SLAs or other requirements are breached, you will be automatically notified by the event server, either through viewing the Event Browser or receiving preconfigured notification emails.

    Threshold alarms can be triggered by periodic collections from the Traffic Collection Manager, or the Task Manager tasks “Device SNMP Collection”, “Device Ping Collection,” and “Device SLA Collection.” For each threshold alarm, the DGS server will examine incoming data against all applicable threshold alarm rules. If any data matches a threshold alarm rule, the DGS server will post an event to the event server with the parameters specified in the threshold alarm. In the Threshold Editor, these rules are referred to as production rules.

    To open the threshold editor, select Application > Threshold Editorfrom the Java interface (or Live Network > Edit Threshold Alarms from the main menu bar of the web interface).

    Figure 210: Threshold Editor

    Threshold Editor

    When the threshold editor is opened for the first time, the tree in the left panel is collapsed, which hides all production rules. Double click an item or click on the hinge to the left of the item to display the elements beneath it. This hierarchy is comprised of the element type, followed by group/scope, and finally followed by the actual production rules, is displayed in the figures below.

    Interpreting the Threshold Editor


    Element Type


    At the topmost level is the Element Type for which the rule will apply: Interface, Node, Tunnel, CPUStats, LSPPingStats, LatencyStats, PingStats, and SLAStats

    • Interface: Rules can be defined in this section for interface-related properties such as bandwidth and ingress and egress utilizations.
    • Node: Rules can be defined in this section for node-related properties such as system up time, last up time, aaa, accounting, authentication, and sessions. These additional properties for aaa and sessions which are related to wireless collection data and may or may not apply to all device types.
    • Tunnel: Rules can be defined in this section for LSP tunnel-related properties such as the delta in the ingress bytes.
    • CPUStats: Rules can be defined in this section for CPU and memory stats such as CPU temperature, CPU utilization, memory used, total memory, and memory utilization.
    • LSPPingStats: Rules can be defined in this section for LSP ping stats on average, max, min, and standard deviation values.
    • LatencyStats: Rules can be defined in this section for latency stats on average, max, min, and standard deviation values.
    • PingStats: Rules can be defined in this section for ping stats on average, max, min, and loss percentage values
    • SLAStats: Rules can be defined in this section for SLA stats such as jitter, packet loss, packet timeout, and latency.

    Scope


    Underneath the element type, the next level is the scope, which defines the group of interfaces for which the threshold rule(s) will be applied to. An include condition can be specified to filter for only interfaces matching some user-specified criteria. An exclude condition can additionally be specified to exclude interfaces with some user-specified criteria. If no fields are specified for the scope, the rules of this scope will be applied to all elements of the given type. For example, a scope can be created underneath the Interface element type that only considers fast ethernet interfaces.

    Figure 211: Example of a Threshold Editor Scope

    Example of a Threshold Editor Scope

    Threshold Rule


    Under the scope, are the actual threshold rules themselves. Here, users can specify the production name, the actual rule, a severity level, and a description. For example, the rule can be created to generate a threshold event when the interface utilization exceeds a particular percentage.

    Figure 212: Example of a Threshold Editor Rule

    Example of a Threshold Editor Rule

    Creating Threshold Crossing Alerts

    Creating a new threshold crossing alert involves two steps: 1) For the desired element type, create a scope identifying a subgroup of elements in which to place the rule. The scope can be used, for example, to filter on only fast ethernet interfaces, or events at a particular node. 2) Next, create the rule itself.


    Creating a New Scope


    To create a new scope, first select the upper level tree item under which the group will be created. Then either click the Create button in the top toolbar, or right-click the selected item and select Create.

    This will create a new group under that item. Select the new group and fill in the fields for the new group on the right pane. To enter text into a field, first double click the field to enable editing of the field.

    • Enter in a Scope Name (required) which will describe the scope of the rules contained within the group. Do not include any spaces in the name. Optionally, enter in a description of the scope in the Description field.
    • The Include and Exclude condition fields are preliminary filters for all rules within the group. Only data matching these conditions will be considered by the rules within the group. For example, you could set “name ~= fe” in the Include condition for an Interface scope to only consider fast ethernet interfaces. To edit these conditions, right-click at the beginning of the field to open the Condition and Rule Builder. For more information on how to define conditions, please see Defining Conditions and Rules. If you do not require any filtering, leave these fields blank.
    • The Is Active checkbox can be used to activate or deactivate the scope and the production rules underneath it. Only if both the scope and production rule is activated will the threshold event be generated.
    • The production count is the number of rules within the group.

    Creating a New Rule


    To create a new rule underneath a scope, first select the scope under which the new rule will be created. Then either click the Create New Production Rule button in the top toolbar, or right-click the selected item and select Create. This will create a new rule under the selected group.

    • Enter in a Production Name (required) to describe the threshold rule. Do not include any spaces in the name.
    • Enter in a Production Rule (required) to define the threshold crossing alert. If incoming data matches this rule, it will trigger the threshold event. Right-click at the beginning of the field to open the Condition and Rule Builder. An example rule for a production rule underneath the Interface scope is “ingressUtil > 75 || egressUtil > 75”. For more information on how to define conditions, please see Defining Conditions and Rules.
    • The Is Active checkbox can be used to activate or deactivate the production rule. Only if both the scope containing the production rule and the production rule is activated will the threshold event be generated.
    • The Event Type is the type of event triggered by this rule, which is displayed in the Event Browser when the threshold crossing alert is created. The default is ThresholdEvent and does not need to be changed. It can be helpful, however, to mark the events with more descriptive event types, such as ThresholdUtilizationEvent and ThresholdMemoryEvent. To define your own categories, see Defining New Threshold Event Categories.
    • The Severity selection is used to configure the severity of the event. This severity can later be viewed in the Event Browser when the Threshold Event is triggered.
    • The Source ID will be displayed as the source of the event triggered by this rule. This field corresponds to the Source ID field in the Event Browser.
    • Finally, the Description Template is used to describe the event triggered by this threshold rule. This is the primary means of specifying threshold event details in the Event Browser. The template allows for specifying keys and dynamic values by enclosing them within square brackets []. For a list of available suggestions while typing in the Description template field, right-click in the beginning of the field. For example, for a rule that triggers an event when ingress utilization or egress utilization exceed 75 percent, the following template may be used:
      [deviceID]: [name]: ingress util [ingressUtil] or egress util [egressUtil] greater than 75%

    Triggering Threshold Alarms

    Note that to trigger the threshold alarm, the corresponding collection (via the Task Manager or Traffic Collection Manager) should be scheduled on a recurring basis.

    • For CPUStats, the “mem” and “cpu” keys require scheduling. See Device SNMP Collection.
    • For LSPPingStats, the “lsp” keys require scheduling. See LSP Ping Collection.
    • For LatencyStats, the “latency” keys require scheduling. See Link Latency Collection.
    • For PingStats, the “ping” keys require scheduling. See Device Ping Collection.
    • For SLAStats, the “sla” keys require scheduling. See Device SLA Collection.
    • For the remainder of the keys in NodeScope, InterfaceScope, and TunnelScope, collection should be scheduled via the Traffic Collection Manager (see Chapter 11, Performance Management: Traffic Collection). The collection interval can be specified in the Collection Elements tab, for the given network (“Global Network” by default), in the “Traffic collection interval(s)” field.

    Defining Conditions and Rules

    In the Condition and Rule Builder, the top panel lists the available keys and the bottom panel displays the resulting rule. In the top panel, use the checkbox to select the desired key(s). In the bottom panel, click the underlined values to edit the logical operators and properties. An optional Consecutive Occurrences field allows users to specify the number of consecutive occurrences before the rule is triggered. Press OK to build the rule syntax.

    Figure 213: Condition and Rule Builder

    Condition and Rule Builder

    Alternatively, the Include and Exclude condition or Production rule syntax can be typed into the field instead of using the Condition and Rule Builder. Group conditions and production rules must be entered in the form of logical expressions with a pre-defined set of keys. For example, the following condition matches when either ingress utilization or egress utilization is greater than or equal to 75 percent: ingressUtil >= 75 || egressUtil >= 75

    • For a list of available keys while editing the condition or rule field, right-click for a list of suggestions, or consult the Available Keys on page 272 below. This list may be different for different types of elements. If unsure of where to start, right-click at the beginning of a field to see all possible keys. Remember that the field must first be activated for editing by double clicking the field.
    • The following are the supported logical operators for reference:== (Equals), != (Does not equal), ~= (Equals using regular expression), && (And), || (Or), < (Less Than), > (Greater than), <= (Less than or equal), and >= (Greater than or equal)
    • Note that all conditions and rules are case sensitive, and spaces should be used as delimiters between keywords, values, and logical operators. Additionally, quotes (““) should be placed around string values, for example, IPAddress == “1.2.3.4”
    • If an integer value is specified for the utilization, the traffic utilization will be compared as integers. To compare using floating numbers, specify the number as a floating number. For example, “ingressUtil > 75.0” instead of “ingressUtil > 75”.

    Consecutive Occurrences


    The special operator “&=” is used to test for consecutive occurrences of a condition. For example, to test that the ingress or egress utilization has been greater than 75 percent for 3 times in a row, you could use the following expression: (ingressUtil >= 75 || egressUtil >= 75) &= 3


    Available Keys


    Below are a list of the attributes for Interface, Node, and Tunnel elements.

    Note that utilization values are specified in percentages (for example, specify 30 for 30 percent).

    See Defining Conditions and Rules for special syntax involving brackets and units


    Common Attributes


    • deviceID: The hostname of the device associated with the element. For the Node element type, this is the same as the name. For the Interface element type, this is the node that contains the interface. For the Tunnel element type, this is the head-end of the tunnel.
    • name: The element’s name (For the Node element type, this is the hostname. For the Interface element type, this is the interface name. For the Tunnel element type, this is the tunnel’s name.)
    • type: The element type (Node, Interface, Tunnel)
    • IPAddress: The IP address for the element

    Interface Attributes:


    • bandwidth: The interface bandwidth. Here, g, m, k, are permitted to indicate the units, for example, 100m for 100 Mbps.
    • ingressBytesDelta, egressBytesDelta: The interface ingress/egress traffic in Bytes per second.
    • ingressUtil, egressUtil: Specify an integer value for percentage, for example, 30 for 30 percent.
    • ingressErrorDelta, egressErrorDelta: The number inbound/outbound packets that contained errors per second.
    • ingressDiscardDelta, egressDiscardDelta: The number inbound/outbound packets that are discarded per second.

    Node Attributes


    • nodeType: Hardware type (for example, M5 for Juniper M5, CISCO) used for sla status data
    • sysUptime, lastUptime: Unit is in hundredths of a second

    Tunnel Attributes


    • ingressBytesDelta: The tunnel traffic in Bytes per second.

    CPU Stats Attributes


    • cpuTemp: CPU temperature
    • cpuUtil: CPU utilization
    • memTotal: total memory
    • memUsed: used memory
    • memUtil: memory utilization

    LSP Ping Stats Attributes


    • lsppingAvg: average lsp ping value
    • lsppingMax: max lsp ping value
    • lsppingMin: min lsp ping value
    • lsppingSD: standard deviation lsp ping value

    Latency Stats Attributes


    • latencyAvg: average latency value
    • latencyMax: max latency value
    • latencyMin: min latency value
    • latencySD: standard deviation latency value

    Ping Stats Attributes


    • pingAvg: average ping value
    • pingMax: max ping value
    • pingMin: min ping value
    • pingLossPercent: ping loss percentage

    SLA Stats Attributes


    • slaDNSError, slaDNSRoundTrip, slaTimeOut
    • slaEgressLatencyAvg, slaEgressLatencyMax, slaEgressLatencyMin
    • slaEgressNegJitterAvg, slaEgressNegJitterMax, slaEgressNegJitterMin
    • slaEgressPacketLoss
    • slaEgressPosJitterAvg, slaEgressPosJitterMax, slaEgressPosJitterMin
    • slaEgressRoundTripAvg, slaEgressRoundTripMax, slaEgressRoundTripMin
    • slaHTTPTransactionError, slaHTTPTransactionRoundTrip,

      slaHTTPTransactionTimeOut, slaHTTPTransactionTimeToFirstByte

    • slaIngressLatencyAvg, slaIngressLatencyMax, slaIngressLatencyMin
    • slaIngressNegJitterAvg, slaIngressNegJitterMax, slaIngressNegJitterMin
    • slaIngressPacketLoss
    • slaIngressPosJitterAvg, slaIngressPosJitterMax, slaIngressPosJitterMin
    • slaIngressRoundTripAvg, slaIngressRoundTripMax, slaIngressRoundTripMin
    • slaPacketOutofSequence, slaPacketTimeout
    • slaRoundTripAvg, slaRoundTripMax, slaRoundTripMin
    • slaTCPConnectionError, slaTCPConnectionRoundTrip,

      slaTCPConnectionTimeOut

    • slaUnknownPacketLoss

    Additional Examples


    Element Type

    Scope

    Production Rule

    Explanation

    Interface

    Exclude condition: name ~= fe || name ~= ge || name ~= Ethernet

    ingressUtil > 50.0 || egressUtil > 50.0

    Generates alarm if non-ethernet links have utilization over 50 percent.

    CPUStats

    Include condition: deviceID== “NWK”

    cpuUtil > 90

    Generates alarm if CPU utilization on router NWK exceeds 90 percent.

    Tunnel

     

    ingressBytesDelta > 8000

    Generates alarm if traffic is over 8kB/s = 64kb/s.

    Defining New Threshold Event Categories

    The default threshold event types that come with the standard installation might not be sufficient especially when the number of threshold rules you want to configure is more than the number of default types. In such case, you can define more threshold event types by following the following procedure:

    The following are example steps to add a new threshold event type ‘ThresholdMemoryEvent_1’ of severity MINOR.


    Server-Side Modifications


    Add the new event type entry to eventtypes.store file under /u/wandl/db/config/ directory as shown below – between the tags <eventTypeStore> and </eventTypeStore>.

    <EventType defaultElementType="None" defaultSeverity="MINOR" id="ThresholdMemoryEvent_I"
    implClass="com.wandl.event.data.BasicEventType" name="ThresholdMemoryEvent_I"
    superType="ThresholdMemoryEvent">
    <Description>Threshold memory event type I</Description>
    </EventType>

    Make sure that the following are satisfied:

    • The severity must match with the supported severities: INFO, NORMAL,UP,WARNING, MINOR, MAJOR, CRITICAL, DOWN.
    • The id and name of the event type must match.
    • The super type must match with one of the supported super types, such as ThresholdEvent and ThresholdMemoryEvent.

    Client-Side Modifications


    1. On the client side, add the new event type entry to MPLSThresholdEditor<serverIP>.xml file under C://Users/<login>/AppData/Roaming/wandl/ between <DefaultEditor> tags that lists default threshold event types. Below highlighted text is the new entry added to the file.
      <DefaultEditor editable="false" implClass="com.wandl.swing.table.TableTools$OptionEditor"
      includeNone="false" type="EventType">
      <ValueEditor implClass="javax.swing.plaf.metal.MetalComboBoxEditor$UIResource"/>
      <OptionValue implClass="java.lang.String">ThresholdEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdEvent_I</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdEvent_II</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdEvent_III</OptionValue>
      <OptionValue implClass="java.lang.String">ApplicationEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdCountEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdDurationEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdMemoryEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdMemoryEvent_I</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdStatusEvent</OptionValue>
      <OptionValue implClass="java.lang.String">ThresholdUtilizationEvent</OptionValue>
    2. Note that the user-added entries will disappear when the xml file is deleted. Make sure to add the entry after a new xml file is created.
    3. Repeat the above steps to add more event types. Consequently, newly added event types would be listed in the Event types drop-down in the Threshold Editor.

    Troubleshooting

    • Event Severity Level: If the threshold crossing alert does not appear, check that the event type is not “INFO”. Events of severity INFO will only be displayed on the fly when the Event Browser is opened, and will not be stored.
    • Units: Check that you are interpreting the attribute with the right units. For example, the utilization should be represented as a percentage (75, for 75%) rather than a fraction (0.75) and the ingressBytesDelta represents Bytes per second rather than bits per second. Refer to Available Keys on page 272 for more details on expected units. You can print out the value in the description for confirmation, for example, use [ingressUtil] and [egressUtil] for interface ingress and egress utilization.
    • Rule ordering: If there are multiple rules within a scope, the last rule is evaluated first In that case, rules must go from general to specific. It might be safer to add in both > and < checks for safety. For example, suppose we have the settings below. Then a memUtil of 75 will use rule c below, not rule a or b. This is as expected.
      • Rule a: memUtil > 50, MINOR
      • Rule b: memUtil > 60, MAJOR
      • Rule c: memUtil > 70, CRITICAL
      • If a rule d is added, which is more general than the preceding rules, then rules a, b, & c will never get used.
      • Rule d: memUtil > 5, Severity WARNING
      • To get around this, you can qualify rules with both < and > checks.
      • Rule d: memUtil > 5 && memUtil < 50.
    • Whole Numbers: Be careful with whole numbers, as the fraction may get ignored. For example, better to use 1.0 instead of 1. If the rule > 60 should include 60.3, then it should be changed either to > 60.0 or >= 60 .This should be changed in the memUtil rules. Otherwise, 60.3 will fail the > 60 rule but succeed the >50 rule. This is because if you specify an integer, our software will evaluate in terms of integers, and truncates any floating point to integer before doing the evaluation. Thus, 60.3 is truncated to 60, and then fails rule > 60.
    • Timestamps: Note that the time stamp of a threshold event can differ by up to 2 collection cycles, depending upon when the event is processed by IP/MPLSView.
    • If no threshold crossing alerts are displayed as expected, rerun the Scheduling Live Network Collection task. It is possible that some information regarding interface bandwidth needs to be updated.
    • Check /u/wandl/log/threshold.log.0 for any error diagnostic messages.

    Modified: 2015-12-29