Understanding How to Use sFlow Technology for Network Monitoring

 

The sFlow technology is a monitoring technology for high-speed switched or routed networks. sFlow monitoring technology randomly samples network packets and sends the samples to a monitoring station called a collector. You can configure sFlow technology on a Juniper Networks router to continuously monitor traffic at wire speed on all interfaces simultaneously.

This topic describes:

Benefits of sFlow Technology

  • sFlow can be used by software tools like a network analyzer to continuously monitor tens of thousands of switch or router ports simultaneously.

  • Since sFlow uses network sampling (forwarding one packet from ‘n’ number of total packets) for analysis, it is not resource intensive (for example processing, memory and more). The sampling is done at the hardware application-specific integrated circuits (ASICs) and hence it is simple and more accurate.

Sampling Mechanism and Architecture of sFlow Technology

sFlow technology uses the following two sampling mechanisms:

  • Packet-based sampling—Samples one packet out of a specified number of packets from an interface enabled for sFlow technology. Only the first 128 bytes of each packet are sent to the collector. Data collected include the Ethernet, IP, and TCP headers, along with other application-level headers (if present). Although this type of sampling might not capture infrequent packet flows, the majority of flows are reported over time, allowing the collector to generate a reasonably accurate representation of network activity. To configure packet-based sampling, you must specify a sample rate.

  • Time-based sampling—Samples interface statistics at a specified interval from an interface enabled for sFlow technology. Statistics such as Ethernet interface errors are captured. To configure time-based sampling, you must specify a polling interval.

The sampling information is used to create a network traffic visibility picture. The Juniper Networks Junos operating system (Junos OS) fully supports the sFlow standard described in RFC 3176, InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks (see http://faqs.org/rfcs/rfc3176.html).

Note

On switches, sFlow technology samples only raw packet headers, that is, the complete Layer 2 network frame.

An sFlow monitoring system consists of an sFlow agent embedded in the router and a centralized collector. The sFlow agent’s two main activities are random sampling and statistics gathering. It combines interface counters and flow samples and sends them across the network to the sFlow collector as UDP datagrams, directing those datagrams to the IP address and UDP destination port of the collector. Each datagram contains the following information:

  • The IP address of the sFlow agent

  • The number of samples

  • The interface through which the packets entered the agent

  • The interface through which the packets exited the agent

  • The source and destination interface for the packets

  • The source and destination VLAN for the packets

Caution

In case of dual VLAN, all fields may not be reported.

Routers and switches can adopt the distributed sFlow architecture. The sFlow agent has subagents. Each subagent is responsible for monitoring a set of network ports and has a unique ID that is used by the collector to identify the data source. A subagent has its own independent state and forwards its own sample messages to the sFlow agent. The sFlow agent is responsible for packaging the samples into datagrams and sending them to the sFlow collector. Because sampling is distributed across subagents, the protocol overhead associated with sFlow technology is significantly reduced at the collector.

Note

On the QFabric system, an sFlow collector must be reachable through the network. Because each Node device has all routes stored in the default routing instance, the collector IP address should be included in the default routing instance to ensure the collector’s reachability from the Node device.

Note

You cannot configure sFlow monitoring on a link aggregation group (LAG), but you can configure it individually on a LAG member interface.

Infrequent sampling flows might not be reported in the sFlow information, but over time the majority of flows are reported. Based on a configured sampling rate N, 1 out of N packets is captured and sent to the collector. This type of sampling does not provide a 100 percent accurate result in the analysis, but it does provide a result with quantifiable accuracy. A user-configured polling interval defines how often sFlow data for a specific interface are sent to the collector, but an sFlow agent can also schedule polling.

Note

We recommend that you configure the same sample rate for all the ports in a line card. If you configure different sample rates, the lowest value is used for all ports on the line card.

Note

If the mastership assignment changes in a Virtual Chassis setup, sFlow technology continues to function.

Adaptive Sampling for Routers

To ensure sampling accuracy and efficiency, routers use adaptive sFlow sampling. Adaptive sampling monitors the overall incoming traffic rate on the device and provides feedback to the interfaces to dynamically adapt their sampling rate to traffic conditions. The sFlow agent reads the statistics on the interfaces every few seconds (once every 12 seconds for MX Series routers and once every 20 seconds for PTX Series routers).

On a Flexible PIC Concentrator (FPC), when the flow sample limit is reached because of sflow sample processing, a binary backoff algorithm is initiated. This reduces the sampling load on that FPC by half. The backoff algorithm achieves this by doubling the sampling rate on the FPC. This process is repeated until the CPU-load due to sflow on the given FPC comes down to an acceptable level.

Note

On the standalone routers, if you configure sFlow technology monitoring on multiple interfaces and with a high sampling rate, we recommend that you specify a collector that is on the data network instead of on the management network. Having a high volume of sFlow technology monitoring traffic on the management network might interfere with other management interface traffic.

Using adaptive sampling prevents overloading of the CPU and keeps the device operating at its optimum level even when there is a change in traffic patterns on the interfaces. The reduced sampling rate is used until the device is rebooted or when a new sampling rate is configured.

Note

sFlow technology on routers does not support graceful restart. When a graceful restart occurs, the adaptive sampling rate is set to the user-configured sampling rate.

Adaptive Sampling for Switches

To ensure sampling accuracy and efficiency, switches use adaptive sFlow sampling. Adaptive sampling monitors the overall incoming traffic rate on the device and provides feedback to the interfaces to dynamically adapt their sampling rate to traffic conditions. The sFlow agent reads the statistics on the interfaces every few seconds (12 seconds for EX Series switches and 5 seconds for QFX Series devices) and identifies five interfaces with the highest number of samples.

On a Flexible PIC Concentrator (FPC), when the CPU processing limit is reached because of sflow sample processing, a binary backoff algorithm is initiated. This reduces the sampling load, arriving through the top five sample-producing interfaces on that FPC by half. The algorithm achieves this by doubling the sampling rate on these five earmarked interfaces.

Using adaptive sampling prevents overloading of the CPU and keeps the device operating at its optimum level even when there is a change in traffic patterns on the interfaces. The reduced sampling rate is used until the device is rebooted or when a new sampling rate is configured.

However, when you use the backoff mechanism, if the number of samples increase due to a traffic spike, the adaptive sample rate does not revert to the previously configured adaptive sample rate even after traffic is stabilized. To address this issue, Starting in 18.3R1, Junos OS supports the adaptive sampling fallback feature, which uses a binary backup algorithm to back up and decrease the sampling rate without affecting normal traffic. Using this feature, you can configure and back up the sampling rate on the switch interface to the previously configured adaptive sample rate . The sampling rate is backed up when the number of samples generated is less than the sample-limit-threshold value.

Note

On a rate-selectable FPC (which supports multiple speeds), interfaces with the highest sample count are selected for adaptive sampling fallback. The backup algorithm selects those interfaces on which the adaptive sampling rate is increased the maximum number of times and then decreases the sampling rate on each of those interfaces every 5 seconds. However, on a single-rate FPC, only one sample rate is supported per FPC, and the adaptive sampling fallback mechanism backs up the sampling rate on all the interfaces of the FPC.

To enable this feature, include the fallback option in the adaptive-sample-rate statement and explicitly configure the threshold value as follows:

set protocols sflow adaptive-sample-rate fallback

set protocols sflow adaptive-sample-rate sample-limit-threshold (0..2400)

You configure adaptive sampling fallback at the [edit protocols sflow] hierarchy level. This feature is disabled by default.

On a QFabric system, sFlow technology monitors the interfaces on each node device as a group, and implements the binary backoff algorithm based on the traffic on that group of interfaces.

Note

On the QFX Series standalone switches, if you configure sFlow technology monitoring on multiple interfaces and with a high sampling rate, we recommend that you specify a collector that is on the data network instead of on the management network. Having a high volume of sFlow technology monitoring traffic on the management network might interfere with other management interface traffic.

Note

sFlow technology on EX Series switches does not support graceful restart. When a graceful restart occurs, the adaptive sampling rate is set to the user-configured sampling rate.

sFlow Agent Address Assignment

The sFlow collector uses the sFlow agent’s IP address to determine the source of the sFlow data. You can configure the IP address of the sFlow agent to ensure that the agent ID of the sFlow agent remains constant. If you do not specify the IP address to be assigned to the agent, an IP address is automatically assigned to the agent based on the following order of priority of interfaces configured on the device:

Routers and EX Series Switches

QFX Series Devices

  1. Virtual Management Ethernet (VME) interface

  2. Management Ethernet interface

  1. Management Ethernet interface me0 IP address

  2. Any Layer 3 interface if the me0 IP address is not available

If a particular interface is not configured, the IP address of the next interface in the priority list is used as the IP address for the agent. Once an IP address is assigned to the agent, the agent ID is not modified until the sFlow service is restarted. At least one interface has to be configured for an IP address to be assigned to the agent. When the agent’s IP address is assigned automatically, the IP address is dynamic and changes when the device reboots.

On the QFabric system, the following default values are used if the optional parameters are not configured:

  • Agent ID is the management IP address of the default partition.

  • Source IP is the management IP address of the default partition.

In addition, the QFabric system subagent ID (which is included in the sFlow datagrams) is the ID of the node group from which the datagram is sent to the collector.

sFlow data can be used to provide network traffic visibility information. You can explicitly configure the source IP address to be assigned to the sFlow datagrams. If you do not explicitly configure the IP address, the IP address of any of the configured Layer 3 network interfaces is used as the source IP address. If a Layer 3 IP address is not configured, then the agent IP address is used as the source IP address.

sFlow Limitations on Routers

On routers, limitations of sFlow traffic sampling include the following:

  • Trio chipset cannot support different sampling rate for each family. Hence, only one sampling rate can be supported per FPC.

  • Adaptive load balancing is applied per FPC and not for per interface under the FPC.

  • True OIF (outgoing interface) is not supported with sFlow.

Routers support configuration of only one sampling rate (inclusive of ingress and egress rates) on an FPC. To support compatibility with the sflow configuration of other Juniper Networks products, the routers still accept multiple rate configuration on different interfaces of the same FPC. However, the router programs the lowest rate as the sampling rate for all the interfaces of that FPC. The (show sflow interfaces) command displays the configured rate and the actual (effective) rate. However, different rates on different FPCs is still supported on Juniper Networks routers.

sFlow Limitations on Switches

On the QFX Series, limitations of sFlow traffic sampling include the following:

  • sFlow sampling on ingress interfaces does not capture CPU-bound traffic.

  • sFlow sampling on egress interfaces does not support broadcast and multicast packets.

  • Egress samples do not contain modifications made to the packet in the egress pipeline.

  • If a packet is discarded because of a firewall filter, the reason code for discarding the packet is not sent to the collector.

  • The out-priority field for a VLAN is always set to 0 (zero) on ingress and egress samples.

  • On QFX5100 standalone switches and the QFX Series Virtual Chassis (including mixed QFX Series Virtual Chassis), egress firewall filters are not applied to sFlow sampling packets. On these platforms, the software architecture is different from that on other QFX Series devices—sFlow packets are sent by the Routing Engine (not the line card on the host) and do not transit the switch. Egress firewall filters affect data packets that are transiting a switch, but do not affect packets sent by the Routing Engine. As a result, sFlow sampling packets are always sent to the sFlow collector.

EX9200 switches support configuration of only one sampling rate (inclusive of ingress and egress rates) on an FPC. To support compatibility with the sflow configuration of other Juniper Networks products, EX9200 switches still accept multiple rate configuration on different interfaces of the same FPC. However, the switch programs the lowest rate as the sampling rate for all the interfaces of that FPC. The (show sflow interfaces) command displays the configured rate and the actual (effective) rate. However, different rates on different FPCs is still supported on EX9200 switches.

Release History Table
Release
Description
Starting in 18.3R1, Junos OS supports the adaptive sampling fallback feature, which uses a binary backup algorithm to back up and decrease the sampling rate without affecting normal traffic.