Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

HealthBot Ingest – Concepts and Examples

 
Summary

This document provides some insight into and examples of some of the ingest methods supported by HealthBot.

SNMP Ingest

SNMP is a widely known and accepted network management protocol that many network device manufacturers, including Juniper Networks, provide for use with their devices. It is a polling type protocol where network devices that are properly configured make configuration, diagnostic, and event information available to collectors, which must also be properly configured and authenticated. The collectors poll devices by sending specifically structured requests, called get requests, to retrieve data.

For information about SNMP as used on Junos OS devices, see Understanding SNMP Implementation in Junos OS.

The example below contains all of the configuration needed for HealthBot to successfully ingest SNMP data from a device or devices in a device group.

Example: Creating a Rule using SNMP Ingest

To illustrate how to configure and use an SNMP sensor, consider a scenario where you want to:

  • Monitor Routing Engine CPU, CPU average, and memory utilization for a device, using SNMP data

  • Create a rule with triggers that indicate when utilization for any of the above elements goes above 80%

To implement this scenario, you will need to complete the following activities:

The workflow is as follows:

CONFIGURE NETWORK DEVICES

Note

This example assumes you have already added your devices into HealthBot and assigned them to a device group.

Add SNMP configuration to the network device

If not already done, configure your network device(s) to accept SNMP get requests from Healthbot. For more details on configuring your Junos device, see the Network Device Requirements section of the HealthBot Installation Guide.

CREATE RULE, APPLY PLAYBOOK

Configure a rule using an SNMP sensor

You can now create a rule using SNMP as the sensor.

This rule includes multiple elements, as shown below:

  • An SNMP sensor to ingest data

  • Five fields extracting specific SNMP data of interest:

    • CPU utilization, memory utilization

    • CPU utilization averages - 1min, 5min, 15min

  • A field representing a static value, used as a threshold

    • Value provided by a variable

  • A field representing a description

    • Value provided by a variable; extracted from the SNMP messages

  • Five triggers, indicating when CPU, CPU average, and memory utilization is higher than the threshold value

  1. In the HealthBot GUI, click Configuration > Rules in the left-nav bar.
  2. On the Rules page, click the + Add Rule button.
  3. On the page that appears, in the top row of the rule window, set the rule name. In this example, rule name is check-system-cpu-memory-snmp.
  4. Add a description and synopsis if you wish.
  5. Click the + Add sensor button and enter the following parameters to configure the sensor, system-cpu-memory:
    • Name is user-defined

    • The sensor is using the Juniper SNMP MIB table jnxOperatingTable

    • HealthBot polls the device group for table data every 60 seconds

  6. Now move to the Variables tab, click the + Add variable button and enter the following parameters to configure the first variable, comp-name:
    • Matches any string that includes “Routing Engine”

    • Referenced later in field description

  7. Click the + Add variable buttononce more and enter the following parameters to configure the second variable, static-threshold:
    • Represents a (default) static value of “80”; in this case, 80%

    • Referenced later in field threshold

  8. Now move to the Fields tab, click the + Add field button and enter the following parameters to configure the first field, cpu-15min-avg:
    • Field names are user-defined

    • Extracts jnxOperating15MinLoadAvg value from SNMP table configured in the sensor

    • jnxOperating15MinLoadAvg - CPU Load Average (as a % value) over the last 15 minutes

  9. Click the + Add field button again and enter the following parameters to configure the second field, cpu-1min-avg:
    • Extracts jnxOperating1MinLoadAvg value from SNMP table

    • jnxOperating1MinLoadAvg - CPU Load Average (as a % value) over the last 1 minute

  10. Click the + Add field button again and enter the following parameters to configure the third field, cpu-5min-avg:
    • Extracts jnxOperating5MinLoadAvg value from SNMP table

    • jnxOperating5MinLoadAvg - CPU Load Average (as a % value) over the last 5 minutes

  11. Click the + Add field button again and enter the following parameters to configure the fourth field, description:
    • Extracts jnxOperatingDescr value from SNMP table

    • jnxOperatingDescr - name or description; for example, ”Routing Engine 0”, “FPC 0”, etc.

    • The expression references the variable comp-name; filters the data further to retain only the values that include the string “Routing Engine”

    • Matching values will act as keys; each key gets a colored block in device health view

  12. Click the + Add field button again and enter the following parameters to configure the fifth field, system-buffer-memory:
    • Extracts jnxOperatingBuffer value from SNMP table

    • jnxOperatingBuffer - buffer pool utilization (as a % value)

  13. Click the + Add field button again and enter the following parameters to configure the sixth field, system-cpu:
    • Extracts jnxOperatingCPU value from SNMP table

    • jnxOperatingCPU - CPU utilization (as a % value)

  14. Click the + Add field button once more and enter the following parameters to configure the seventh field, threshold:
    • The expression references the variable static-threshold, giving this field the (default) integer value “80”

    • Referenced later in triggers

  15. Now move to the Triggers tab, click the + Add trigger button and enter the following parameters to configure the first trigger, system-buffer:
    • Trigger names are user-defined

    • Trigger logic runs every 90 seconds

    • Evaluate terms in sequence; when a term’s conditions are met, show its color and message on the device health pages

    • When system memory buffer utilization (the value in field system-buffer-memory) is greater than 80 (the value in field threshold), set color to red and show related message

    • Otherwise, set color to green and show related message

  16. Click the click the + Add trigger button again and enter the following parameters to configure the second trigger, system-cpu:
    • Trigger logic runs every 90 seconds

    • When CPU utilization (the value in field system-cpu) is greater than 80 (the value in field threshold), set color to red and show related message

    • Otherwise, set color to green and show related message

  17. Click the click the + Add trigger button again and enter the following parameters to configure the third trigger, system-cpu-15min-average:
    • Trigger logic runs every 90 seconds

    • When CPU 15min utilization average (the value in field cpu-15min-avg) is greater than or equal to 80 (the value in field threshold), set color to red and show related message

    • Otherwise, set color to green and show related message

  18. Click the click the + Add trigger button again and enter the following parameters to configure the fourth trigger, system-cpu-1min-average:
    • Trigger logic runs every 90 seconds

    • When CPU 1min utilization average (the value in field cpu-1min-avg) is greater than or equal to 80 (the value in field threshold), set color to red and show related message

    • Otherwise, set color to green and show related message

  19. Click the click the + Add trigger button once more and enter the following parameters to configure the fifth trigger, called system-cpu-5min-average:
    • Trigger logic runs every 90 seconds

    • When CPU 5min utilization average (the value in field cpu-5min-avg) is greater than or equal to 80 (the value in field threshold), set color to red and show related message

    • Otherwise, set color to green and show related message

  20. At the upper right of the window, click the + Save & Deploy button.

Add the rule to a playbook

With the rule created, you can now add it to a playbook. For this example, create a new playbook to hold the new rule.

  1. Click Configuration > Playbooks in the left-nav bar.
  2. On the Playbooks page, click the + Create Playbook button.
  3. On the page that appears, enter the following parameters:
  4. Click Save & Deploy.

Apply the playbook to a device group

To make use of the playbook, apply it to a device group.

  1. On the Playbooks page, click the Apply (Airplane) icon for the playbook you configured above.
  2. On the page that appears:
    • Enter a playbook instance name

    • Select the desired device group

    • (Optional) If desired, you can adjust the variables for this playbook instance to use different values than the defaults configured in the rule

    • Click Run Instance

  3. On the Playbooks page, confirm that the playbook instance is running. Note that the playbook instance may take some time to activate.

MONITOR

Monitor the devices

With the playbook applied, you can begin to monitor the devices.

  1. Click Monitor > Device Group Health in the left-nav bar. and
  2. Select the device group to which you applied the playbook from the Device Group pull-down menu.
  3. Select one or more of the devices to monitor.
  4. In the Tile View, hover your mouse over one of the external tiles.
    • external is the topic name under which the rule was created

    • Each colored block represents a key and its related values

    • The mouse-over window shows information related to the given key, with the triggers listed inside

  5. In the Table View, try out the various filters and sorting options.
    • Each trigger is listed as a KPI

Syslog Ingest

Overview

Starting with Release 2.1.0, HealthBot supports syslog natively as another ingest method, using a data model that aligns with other HealthBot ingest mechanisms to provide all the same feature richness.

Syslog ingest requires some setup before you can use it as a sensor in a rule:

  • Pattern - A pattern identifies some syslog event; you create a pattern for each event you want to monitor. You can configure patterns for both structured and unstructured events.

  • Pattern set - With the patterns configured, you then group them into a pattern set, which you then reference when defining the syslog sensor settings within a rule.

To see how patterns and pattern sets are used, see Example: Creating a Rule Using Syslog Ingest.

System-Generated Fields

Some fields are common in syslog messages. HealthBot extracts these fields and includes them automatically in the raw table, enabling you to make use of them directly when creating a rule, and avoiding the need to configure patterns.

To illustrate use of these values, consider the following example syslog messages:

Structured - <30>1 2019-11-22T03:17:53.605-08:00 R1 mib2d 28633 SNMP_TRAP_LINK_DOWN [junos@2636.1.1.1.2.29 snmp-interface-index="545" admin-status="up(1)" operational-status="down(2)" interface-name="ge-1/0/0.16"] ifIndex 545, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-1/0/0.16

Equivalent unstructured - <30>Nov 22 03:17:53 R1 mib2d[28633]: SNMP_TRAP_LINK_DOWN: ifIndex 545, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-1/0/0.16

System-generated fields:

  • "__log_priority__" - Priority of syslog message

    • In the examples, <30> denotes the priority

  • “__log_timestamp__" - Timestamp in epcoh in the syslog message

    • In the structured example, 2019-11-22T03:17:53.605-08:00 is converted to epoch with -08:00 indicating the time zone

    • In the unstructured example, the time zone from the configuration will be used to calculate epoch

  • "__log_host__" - Host name in the syslog message

    • In the examples, R1 denotes the host name

  • "__log_application_name__” - Application name in the syslog message

    • In the examples, mib2d is the application name

  • "__log_application_process_id__” - Application process ID in the syslog message

    • In the examples, 28633 is the ID

  • "__log_message_payload__" - Payload in the message

    • Structured example - “SNMP_TRAP_LINK_DOWN [junos@2636.1.1.1.2.29 snmp-interface-index="545" admin-status="up(1)" operational-status="down(2)" interface-name="ge-1/0/0.16"] ifIndex 545, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-1/0/0.16”

    • Unstructured example - “SNMP_TRAP_LINK_DOWN: ifIndex 545, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-1/0/0.16”

  • "Event-id" - Denotes the event ID configured in the pattern

    • In the examples, SNMP_TRAP_LINK_DOWN is the event ID

Note

Be sure not to define any new fields using a name already defined above.

Usage Notes

  • If you add a device in HealthBot using its IP address (and the address is not the actual host name of the device), you must also add the device’s host name in the Syslog Hostnames field.

  • Multiple device groups can listen for syslog messages using the same port.

  • A configured time zone is not considered when processing structured messages, as the messages themselves include the time zone.

  • Daylight savings time is not currently supported.

Optional Configuration Elements

Configure syslog ports

By default, HealthBot listens for syslog messages from all device groups on UDP port 514. You can change the system-level syslog port, and also configure one or more ports per device group. The more specific device group setting takes precedence over the system level setting.

To change the system-level syslog port:

  1. Click Settings > Ingest Settings in the left-nav bar.
  2. Select the Syslog tab on the left side of the page.
  3. On the Syslog Settings page, edit the port number.
  4. Click Save & Deploy.

To configure a syslog port for a device group:

  1. Go to the Configuration > Device Group page and click on the name of a device group.
  2. Click the Edit (Pencil) icon.
  3. In the pop-up window, enter the port(s) in the Syslog Ports field.
  4. Click Save & Deploy.

Configure time zone for a device

When a device exports structured syslog messages, time zone information is included within the message. However, unstructured syslog messages do not include time zone information. By default, HealthBot uses GMT as the time zone for a device. In these cases you can assign a time zone to a device or device group within HealthBot.

To configure a device’s time zone at the device group level:

  1. Go to the Configuration > Device Group page and click on the name of a device group.
  2. Click the Edit (Pencil) icon.
  3. In the pop-up window, enter a value in the Timezone field, for example -05:00.

To configure a device’s time zone at the device level:

  1. Go to the Configuration > Device page and click on the name of a device.
  2. Click the Edit (Pencil) icon.
  3. In the pop-up window, enter a value in the Timezone field, for example -05:00.
Note

The more specific device setting takes precedence over the device group setting.

Configure multiple source IP addresses for a device

In cases where syslog messages arrive from a device using a different source IP address than the one originally configured in the HealthBot GUI, you can add additional source IP addresses.

To support additional source IP addresses:

  1. Go to the Configuration > Device page and click on the name of a device.
  2. Click the Edit (Pencil) icon.
  3. In the pop-up window, enter the IP address(es) in the Syslog Source IPs field.

Configure host name aliases for a device

When a device has more than one host name, such as a device with dual REs, syslog messages can arrive at the HealthBot server with a host name that is not the device’s main host name. In these cases, you can add host name aliases for that device.

Note

If you add a device in HealthBot using its IP address, you must also add the host name that will appear in the syslog messages.

To configure additional hostname aliases:

  1. Go to the Configuration > Device page and click on the name of a device.
  2. Click the Edit (Pencil) icon.
  3. In the pop-up window, enter the host name(s) in the Syslog Host Names field.

Example: Creating a Rule Using Syslog Ingest

To illustrate how to configure and use a syslog sensor, consider a scenario where you want to:

  • Monitor interface operational down status

  • Use two syslog events, one structured and one unstructured

  • Use a rule with a trigger to indicate when the interface goes down

To implement this scenario, you will need to complete the following activities:

The workflow is as follows:

CONFIGURE NETWORK DEVICES

Note

This example assumes you have already added your devices into HealthBot and assigned them to a device group.

Add syslog configuration to the network device

If not already done, configure your network device(s) to send syslog data to Healthbot. For more details on configuring your Junos device, see the Network Device Requirements section of the HealthBot Installation Guide.

SET UP SYSLOG INGEST

Configure patterns

A pattern is a configuration to monitor some syslog event; you create a pattern for each event you want to monitor. This example uses patterns to monitor four syslog events (two structured and two unstructured).

Note

See the usage notes at the end of this section for more detail on what has been configured.

  1. In the HealthBot GUI, click Settings > Ingest in the left-nav bar.
  2. Select the Syslog tab on the left of the page.
  3. On the Syslog Settings page, click the + Add Pattern button.
  4. In the pop-up window that appears, enter the following parameters for the first pattern, named snmp-if-link-down:
  5. Click OK.
  6. Click the add pattern button (+ Add Pattern) once more and enter the following parameters for the second pattern, named fpc-offline:
    Note

    The full value entered in the Filter field is fpc%{NUMBER:fpc} Marking ports %{WORD:port-status}

  7. Click OK. On the Syslog Settings page you should see the two patterns you just created.
  8. At the upper right of the Syslog Settings window, click the save and deploy button (+ Save & Deploy).

Usage notes for the patterns

For structured syslog:

  • The event ID (SNMP_TRAP_LINK_DOWN) references the event name found within the syslog messages.

  • Fields are optional for structured syslog messages; if you don’t configure fields, the attribute names from the message will be treated as field names.

    • In this example, however we have user-defined fields:

      • The field names (if-name, snmp-index) are user-defined.

      • The field interface-name value is an attribute from the syslog message, for example, ge-0/3/1.0; this field is renamed as if-name

      • The field snmp-interface-index value is an attribute from the syslog message, for example, ifIndex 539; this field is renamed as snmp-index

      • The field snmp-interface-index here is defined as an integer; by default the fields extracted from a syslog message are of type string, however type integer changes this to treat the value as an integer

  • The constant section is optional, in this example, we have user-defined constants.

    • The constant name ifOperStatus is user-defined; in this case it has the integer value of '2'

  • Filter configuration is optional for a structured syslog, though you can do so if desired; if used, the filter-generated fields will override the fields included in the syslog message.

  • The key fields section is optional; by default the hostname and event ID will be the keys used by HealthBot; add additional key fields here; in this example, we have key-fields, namely interface-name, where the name and value are extracted from the syslog message’s attribute-value pair

For unstructured syslog:

  • The event ID is user defined, this case PSEUDO_FPC_DOWN

    • For example, neither the unstructured syslog Nov 22 02:27:05 R1 fpc1 Marking ports down nor its structured counterpart <166>1 2019-11-22T02:38:23.132-08:00 R1 - - - - fpc1 Marking ports down includes an event ID

  • A filter must be used to derive fields (unlike proper structured syslog); this example uses fpc%{NUMBER:fpc} Marking ports %{WORD:port-status}, where fpc becomes the field name and NUMBER denotes the syntax used to extract the characters out of that particular portion of the message, for example “2”

    • An example of a syslog message that matches the grok filter is “fpc2 Marking ports down”

  • constant fpc-status - has a string value of ‘online’

Regarding filters:

  • By default in a pattern, field and constant values are a string; to treat it as an integer or float, define the pattern’s field type as integer or float

  • For unstructured patterns, you must configure a filter as the messages are sent essentially as plain text and don’t include field info on their own

  • Filters should always be written to match the portion of message after the event ID; this allows the filter to parse a syslog message irrespective of whether it arrives in unstructured or structured format

    • For example, the filter fpc%{NUMBER:fpc} Marking ports %{WORD:port-status} matches both versions of the following syslog message:

      • Structured: <166>1 2019-11-22T02:38:23.132-08:00 R1 - - - - fpc1 Marking ports down

      • Unstructured: Nov 22 02:27:05 R1 fpc1 Marking ports down

Add patterns to a pattern set

With the patterns configured, group them into a pattern set.

  1. On the Syslog Settings page, scroll down and click the add pattern set button (+ Add Pattern Set).
  2. In the pop-up window that appears, enter the following parameters:
  3. Click OK. On the Syslog Settings page you should see the pattern set you just created.
  4. At the upper right of the window, click the save and deploy button (+ Save & Deploy).

CREATE RULE, APPLY PLAYBOOK

Configure a rule using the syslog sensor

With the syslog ingest settings complete, you can now create a rule using syslog as the sensor.

This rule includes three elements:

  • A syslog sensor

  • Four fields capturing data of interest

  • A trigger that indicates when the interface goes down

Note

See the usage notes at the end of this section for more detail on what has been configured.

  1. Click Configuration > Rules in the left-nav bar.
  2. On the Rules page, click the + Add Rule button.
  3. On the page that appears, in the top row of the rule window, set the rule name. In this example, it is check-interface-status.
  4. Add a description and synopsis if you wish.
  5. Click the + Add sensor button and enter the following parameters in the Sensors tab:
  6. Now move to the Fields tab, click the + Add field button, and enter the following parameters to configure the first field, named event-id:
  7. Click the + Add field button again and enter the following parameters to configure the second field, named fpc-slot:
  8. Click the + Add fieldbutton again and enter the following parameters to configure the third field, named if-name:
  9. Click the + Add field button once more and enter the following parameters to configure the fourth field, named snmp-index:
  10. Now move to the Triggers tab, click the + Add trigger button, and enter the following parameters to configure a trigger named link-down:
  11. At the upper right of the window, click the + Save & Deploybutton.

Usage Notes for the rule

  • Sensor tab

    • The sensor name if-status-sensor is user-defined

    • The sensor type is syslog

    • Pattern set check-interface-status - reference to the pattern set configured earlier

    • If not set, the Maximum hold period defaults to 1s

  • Fields tab

    • Four fields are defined; although the patterns are capturing more than four fields of data, this example defines four fields of interest here; these fields are used in the trigger settings

    • The field names (event-id, fpc-slot, if-name, snmp-index) are user-defined

    • path event-id - default field created by syslog ingest in the raw table; references the field from the pattern configuration

    • path fpc - references the value from the filter used in the unstructured pattern configuration

    • path if-name - references the field from the pattern configuration

      • Data if missing all interfaces - if the if-name value is not included in the syslog message, use the string value “all interfaces”

    • path snmp-index - references the field from the pattern configuration

  • Triggers tab

    • The trigger name link-down is user-defined

    • frequency 2s - HealthBot checks for link-down syslog messages every 2 seconds

    • term is-link-down - when $event-id is equal to SNMP_TRAP_LINK_DOWN, in any syslog message in the last 300 seconds, make red and show the message Link down for $if-name(snmp-id: $snmp-index)

      • $event-id - $ indicates to reference the rule field event-id

      • Link down for $if-name(snmp-id: $snmp-index) - for example, “Link down for ge-2/0/0 of FPC 2”

      • $if-name - references the field value, i.e., the name of the interface in the syslog message

    • term is-fpc-down - when $event-id is equal to PSEUDO_FPC_DOWN, in any syslog message in the last 300 seconds, make red and show the message Link down for $if-name of FPC$fpc-slot

      • $event-id - $ indicates to reference the rule field event-id

      • $if-name - “all interfaces”

      • Link down for $if-name of FPC$fpc-slot - for example, “Link down for all interfaces of FPC 2”

Add the rule to a playbook

With the rule created, you can now add it to a playbook. For this example, create a new playbook to hold the new rule.

  1. Click Configuration > Playbooks in the left-nav bar.
  2. On the Playbooks page, click the + Create Playbook button.
  3. On the page that appears, enter the following parameters:
  4. Click Save & Deploy.

Apply the playbook to a device group

To make use of the playbook, apply it to a device group.

  1. On the Playbooks page, click the Apply (Airplane) icon for the check-interface-status playbook.
  2. On the page that appears:
    • Enter a name

    • Select the desired device group

    • Click Run Instance

  3. On the Playbooks page, confirm that the playbook instance is running. Note that the playbook may take some time to activate.

MONITOR

Monitor the devices

With the playbook applied, you can begin to monitor the devices.

  1. Click Monitor > Device Group Health in the left-nav bar, and select the device group to which you applied the playbook from the Device Group pull-down menu.
  2. Select one of more of the devices to monitor.
  3. In the Tile View, the tile labeled external contains the parameters from the rule you configured earlier.
Note

For this example, since the rule trigger does not include a ‘green’ term, the status will show red when there is an issue and otherwise will show gray (no data).

NetFlow Ingest

As mentioned in HealthBot Concepts, starting with release 3.0.0, HealthBot natively supports NetFlow v9 and v10 (IPFIX). This document describes, in detail, how HealthBot interacts with Junos devices to receive and process (ingest) NetFlow traffic data.

For the sake of completeness and to ensure correct context, the overview information from HealthBot Concepts is repeated here.

How NetFlow Works

NetFlow is a network protocol for collecting IP traffic statistics, which can then be exported to a tool for analysis. NetFlow is available in different versions, the latest being NetFlow v9 and NetFlow v10. The NetFlow v9 data export format is described in RFC 3954; NetFlow v10 is officially known as IPFIX and standardized in RFC 7011.

Junos devices support flow monitoring and aggregation using these protocols. The Junos OS samples the traffic, builds a flow table, and sends the details of the flow table over a configured UDP port to a collector, in this case Healthbot. HealthBot receives the incoming Netflow data, auto-detects it as v9 or v10, and process it further.

As shown above, the network device pushes data from the Packet Forwarding Engine, that is, directly from a line card. This means flow data is sent over the forwarding plane, so the collector must have in-band connectivity to the device. To use the flow sensor option, you configure the device with settings that include where to send the flow data. When you configure HealthBot to start collecting the data, the flow data is already flowing towards the server.

Flow Templates

Where other ingest methods have established sensor formats and identification details - for example, native GPB references paths, SNMP references MIBs, etc. - flow has no equivalent mechanism. Instead, HealthBot uses templates. These flow templates provide a mechanism to identify and decode incoming flow data before sending it for further processing.

HealthBot provides predefined flow templates for NetFlow v9 and v10 (IPFIX), or you can define your own. The predefined templates match those which the Junos OS currently supports. For example, the Junos OS template, ipv4-template, aligns with the HealthBot template hb-ipfix-ipv4-template. To view the fields used in the Junos OS templates, see Understanding Inline Active Flow Monitoring.

Note

In the current ingest implementation for NetFlow, the following field types are not supported:

  • Fields for enterprise specific elements

  • Variable length fields

Flow Ingest Processing

The raw flow data that HealthBot receives is in binary format and unreadable. In order to make this data usable, HealthBot processes the incoming flow data as follows:

  • HealthBot listens for incoming flow data on a configured port

  • Since NetFlow messages don’t include a field that identifies the sending device, HealtBot uses the configured source IP address to derive a device ID

  • Templates identify and decode incoming flow data to determine which fields it contains

  • HealthBot then normalizes the fields for further use within the system

The resulting decoded and normalized data is now in a readable and usable format. Here is an example of flow data decoded using the hb-ipfix-ipv4-template template:

hb-ipfix-ipv4-template, destinationIPv4Address=192.168.48.200,destinationTransportPort=443,icmpTypeCodeIPv4=0,ingressInterface=1113, ipClassOfService=0,protocolIdentifier=6,sourceIPv4Address=172.16.235.191,sourceTransportPort=51032, bgpDestinationAsNumber=4200000000i,bgpSourceAsNumber=65000i,destinationIPv4PrefixLength=24i, dot1qCustomerVlanId=0i,dot1qVlanId=0i,egressInterface=1041i,flowEndMilliseconds=1483484591502u, flowEndReason=2i,flowStartMilliseconds=1483484577531u,ipNextHopIPv4Address="192.168.41.49", maximumTTL=120i,minimumTTL=120i,octetDeltaCount=188i,packetDeltaCount=4i,sourceIPv4PrefixLength=19i, tcpControlBits=16i,vlanId=0i 1483484642000000000

The data shows information from the NetFlow messages using naming according to IPFIX Information Elements. For example, destinationIPv4Address maps to element ID 12 in the elements table.

Warning

For NetFlow ingest, ensure that there is no source NAT in the network path(s) between the device and Healthbot. If the network path contains source NAT, then the device information received is not accurate.

Configuration - Device Configuration

To use NetFlow as an ingest method in HealthBot, you must add configuration to the device you wish to monitor, to enable it to export flow data into HealthBot.

This example includes the Netflow v10 IPv4 template; adjust as needed for your environment. If not already done, complete device configuration to send NetFlow data to HealthBot as shown below.

## IPFIX template configuration

## Apply IPFIX template to enable traffic sampling

## 10.102.70.200 = HealthBot server

## port 2055; use this value in HealthBot GUI (device group config)

## inline-jflow = Enable inline flow monitoring for traffic from the designated address

## 198.51.100.1 = in-band interface doing the exporting; use this value in HealthBot GUI (device config)

## Associate sampling instance with the FPC

## Specify which interface traffic to sample

Add the Device In HealthBot

Now add the device in HealthBot, specifying the IP address(es) that will send the flow data.

  1. In the HealthBot GUI, click Configuration > Device in the left-nav bar, and click the add device button (+ Device).
  2. Click the + (Add Device) button
  3. In the Add Device(s) window that appears, fill in the appropriate fields.

    Be sure to fill in the Flow IPs field with the IP address(es) from which NetFlow data will arrive.

  4. Click Save & Deploy.

For more information about adding a device, see Adding a Device in Managing Devices, Device Groups, and Network Groups.

Usage Notes:

  • Incoming NetFlow messages don’t include a device ID; HealthBot uses the message’s source IP address to derive a device ID

  • When configuring this step, use the in-band interface IP address you configured in the sampling instance configuration on the device.

Add Device Group

With the device added, you now need to create a device group and define the flow ingest port for the device group.

  1. Click Configuration > Device Group
  2. Click the + (Add Device Group) button
  3. In the Add Device Group window that appears, fill in the fields as appropriate. In the Flow Ports field, enter the port(s) on which NetFlow data will arrive.Note

    If your HealthBot installation is a multi-node installation using Kubernetes, you must also specify which HealthBot nodes will receive the NetFlow traffic by filling in the Flow Deploy Nodes field in the Add/Edit Device Group window.

  4. Click Save & Deploy

    Usage Notes:

    • HealthBot will listen for NetFlow messages on this port for devices in this group.

    • The configured NetFlow ingest ports cannot be the same across device groups. You must configure a different port (or ports) for each group.

Define NetFlow Ingest Settings - Review Predefined Templates

Flow templates provide a mechanism to identify and decode incoming flow data before sending it for further processing within HealthBot.

  1. Click Settings > Ingest Settings in the left-nav bar.
  2. Click the Netflow tab on the left side of the page.
  3. On the Netflow settings page, review the available templates for use in a rule.

Usage Notes:

  • Notice that there are default flow templates for IPv4, IPv6, MPLS, MPLS-IPv4, MPLS-IPv6, and VPLS, for each of NetFlow v9 and v10.

  • The flow templates include recognition patterns, called include fields and exclude fields, which help to recognize, identify, and categorize the incoming messages.

  • Since NetFlow messages don’t distinguish between keys and values (all fields are simply incoming data), the templates specify which fields should be treated as keys for raw data.

Define NetFlow Ingest Settings - (Optional) Create Your Own NetFlow Template

If the existing templates do not meet your needs, you can create your own template. You can also use custom templates to support other vendors’ devices.

  1. On the Netflow settings page, click the + Template button.
  2. In the Add Template window that appears, fill in the following fields (you can leave the other settings as is):
    • Template Name - give the template a name

    • NetFlow version - select v9 or v10

    • Priority - Available values are 1 through 10

    • Include Fields - add one or more fields that you want included in the template you wish to use

    • Exclude Fields - add one or more fields that you do not want included in the template you wish to use

    • Key Fields - specify which fields in the incoming messages should be treated as keys

  3. Click Save & Deploy

    You should now see the template added to the NetFlow settings page.

  4. (Optional) Repeat the steps above to create more templates.

Usage Notes:

  • Priority - when a playbook includes multiple rules using the flow sensor, the priority value identifies which sensor and template gets priority over the other(s).

  • Include/Exclude fields - include fields to help identify the template to use, or at least a ‘short list’ of templates to use; exclude fields then narrow down to the single desired template.

    • Example 1 - consider the hb-ipfix-ipv4-template template: it includes two IPv4 fields to narrow down to hb-ipfix-ipv4-template and hb-ipfix-mpls-ipv4-template, and excludes an MPLS field to eliminate hb-ipfix-mpls-ipv4-template, leaving only hb-ipfix-ipv4-template.

    • Example 2 - consider the hb-ipfix-mpls-ipv4-template template: it includes the same two IPv4 fields to narrow down to hb-ipfix-ipv4-template and hb-ipfix-mpls-ipv4-template. It also includes an MPLS field, which immediately eliminates the former template and leaving the latter as the template to use.

Configure a Rule Using the Flow Sensor

With the flow ingest settings complete, you can now create a rule using flow as the sensor.

This example rule includes three elements:

  • A flow sensor that uses the NetFlow v10 IPv4 template

  • Six fields capturing data of interest

  • A trigger that indicates when traffic flow is higher or lower than expected

Note

See the usage notes at the end of this section for more detail on what has been configured.

  1. Click Configuration > Rules in the left-nav bar.
  2. On the Rules page, click the + Add Rule button.

    The Rules page refreshes to show a nearly empty rule on the right part of the page.

  3. In the top row of the rule window, leave the topic set as external and set the rule name that appears after the slash (/). In this example, it is periodic-aggregation-flow-rule.
  4. Add a description and synopsis if you wish.
  5. Click the + Add Sensor button and enter the following parameters in the Sensors tab:
  6. Now move to the Fields tab, click the + Add Field button and enter the following parameters to configure the first field, source-ipv4-address:
  7. Click the + Add Field button again and enter the following parameters to configure the second field, destination-ipv4-address:
  8. Click the + Add Field button again and enter the following parameters to configure the third field, sensor-traffic-count:
  9. Click the + Add Field button again and enter the following parameters to configure the fourth field, total-traffic-count:
  10. Click the + Add Field button again and enter the following parameters to configure the fifth field, traffic-count-maximum:
  11. Click the + Add Field button once more and enter the following parameters to configure the sixth field, traffic-count-minimum:
  12. As the last step for the fields configuration, set the field aggregation time-range value to 10s:
  13. Now move to the Variables tab, click the + ADD VARIABLE button and create the traffic-count-max and traffic-count-min variables that are the constants for the traffic-count-maximum and traffic-count-minimum fields, respectively.
    Note

    Only the definition for the traffic-count-max is shown graphically. Choose an appropriate Default Value when configuring both traffic-count-max and traffic-count-min variables. The value shown above is for testing purposes only and may not be appropriate for your network.

  14. Now move to the Triggers tab, click the + Add trigger button and enter the following parameters to configure a trigger called traffic-measurement-trigger:
  15. At the upper right of the window, click the Save & Deploy button.

Usage Notes:

  • Sensor Tab:

    • The sensor name ipv4-flow-sensor is user-defined

    • The sensor type is flow

    • The sensor uses the predefined template hb-ipfix-ipv4-template

  • Variables Tab:

    • The variables traffic-count-max and traffic-count-min are statically configured integers. In this case the values represent Bytes per second

    • These values are referenced in fields traffic-count-maximum and traffic-count-minimum and provide a reference point to compare against the total-traffic-count field

  • Fields Tab:

    • Six fields are defined; some fields are used in the trigger settings while one field is referenced within another field

    • The field names are user-defined fields (UDF)

    • Fields source-ipv4-address, destination-ipv4-address, and sensor-traffic-count are extracting information from the flow sensor input

    • Path values for these fields identify specific values from the NetFlow messages, using naming according to IPFIX Information Elements

    • Fields source-ipv4-address and destination-ipv4-address have the Add to rule key setting enabled, indicating that this field should be shown as a searchable key for this rule on the device health pages

    • Field total-traffic-count - sums the IPv4 packet count from the sensor-traffic-count field every 10 seconds

    • The fields traffic-count-maximum and traffic-count-minimum are simply fixed values; the values are derived from the variables defined above

    • Field aggregation time-range - typically set to a value higher (longer) than individual field time range settings with the aim of reducing the frequency of information being sent to the database

  • Triggers Tab:

    • The trigger name traffic-measurement-trigger is user-defined.

    • frequency 90s - HearthBot compares traffic counts every 90 seconds

    • In the term traffic-abnormal-gr:

      • When $total-traffic-count (the periodic count of incoming IPv4 traffic) is greater than $traffic-count-maximum (2500 Bps), show red and the message: “Total traffic count is above normal. Current total traffic count is : $total-traffic-count”.

    • In the term traffic-abnormal-ls:

      • When $total-traffic-count (the periodic count of incoming IPv4 traffic) is less than $traffic-count-minimum (500 Bps), show yellow and the message: “Total traffic count is below normal. Current total traffic count is : $total-traffic-count”.

    • In the term default-term:

    • Otherwise, show green and the message: “Total traffic count is normal. Current total traffic count is : $total-traffic-count”.

Add the Rule to a Playbook

With the rule created, you can now add it to a playbook. For this example, create a new playbook to hold the new rule.

  1. Click Configuration > Playbooks in the left-nav bar.
  2. On the Playbooks page, click the create + Create Playbook button.
  3. On the page that appears, enter the following parameters:
  4. Click Save & Deploy

Apply the Playbook to a Device Group

  1. On the Playbooks page, click the Apply (Airplane) icon for the playbook you configured above.
  2. On the Run Playbook page that appears
    • Enter a name for the playbook instance.

    • Select the desired device group from the Apply Group pull-down menu.

    • Click Run Instance.

  3. On the Playbooks page, confirm that the playbook instance is running. Note that the playbook may take some time to activate.

Monitor the Devices

With the playbook applied, you can begin to monitor the devices.

  1. Click Monitor > Device Group Health in the left-nav bar and select the device group to which you applied the playbook from the Device Group pull-down menu.
  2. Select one of more of the devices to monitor.
  3. In the Tile View, the external tile contains the parameters from the rule you configured earlier.