Understanding How to Use sFlow Technology for Network Monitoring
The sFlow technology is a monitoring technology for high-speed switched or routed networks. sFlow randomly samples network packets and sends the samples to a monitoring station called a collector.
This topic describes:
Benefits of sFlow Technology
sFlow can be used by software tools like a network analyzer to continuously monitor tens of thousands of switch or router ports simultaneously.
Because sFlow uses network sampling (forwarding one packet from n number of total packets) for analysis, it is not resource intensive (for example processing, memory and more). The sampling is done at the hardware application-specific integrated circuits (ASICs) and, hence, it is simple and more accurate.
Sampling Mechanism and Architecture of sFlow Technology
sFlow technology uses the following two sampling mechanisms:
Packet-based sampling—Samples one packet out of a specified number of packets from an interface enabled for sFlow technology. Only the first 128 bytes of each packet are sent to the collector. Data collected include the Ethernet, IP, and TCP headers, along with other application-level headers (if present). Although this type of sampling might not capture infrequent packet flows, the majority of flows are reported over time, allowing the collector to generate a reasonably accurate representation of network activity. To configure packet-based sampling, you must specify a sample rate.
Time-based sampling—Samples interface statistics at a specified interval from an interface enabled for sFlow technology. Statistics such as Ethernet interface errors are captured. To configure time-based sampling, you must specify a polling interval.
The sampling information is used to create a network traffic visibility picture. The Juniper Networks Junos operating system (Junos OS) fully supports the sFlow standard described in RFC 3176, InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks (see http://faqs.org/rfcs/rfc3176.html).
On switches, sFlow technology samples only raw packet headers, that is, the complete Layer 2 network frame.
An sFlow monitoring system consists of an sFlow agent embedded in the router or switch and a centralized collector. The sFlow agent’s two main activities are random sampling and statistics gathering. It combines interface counters and flow samples and sends them across the network to the sFlow collector as UDP datagrams, directing those datagrams to the IP address and UDP destination port of the collector. Each datagram contains the following information:
The IP address of the sFlow agent
The number of samples
The interface through which the packets entered the agent
The interface through which the packets exited the agent
The source and destination interface for the packets
The source and destination VLAN for the packets
In case of dual VLANs, all fields may not be reported.
Routers and switches can adopt the distributed sFlow architecture. The sFlow agent has subagents. Each subagent is responsible for monitoring a set of network ports and has a unique ID that is used by the collector to identify the data source. A subagent has its own independent state and forwards its own sample messages to the sFlow agent. The sFlow agent is responsible for packaging the samples into datagrams and sending them to the sFlow collector. Because sampling is distributed across subagents, the protocol overhead associated with sFlow technology is significantly reduced at the collector.
On the QFabric system, an sFlow collector must be reachable through the network. Because each Node device has all routes stored in the default routing instance, the collector IP address should be included in the default routing instance to ensure the collector’s reachability from the Node device.
You cannot configure sFlow monitoring on a link aggregation group (LAG), but you can configure it individually on a LAG member interface.
Infrequent sampling flows might not be reported in the sFlow information, but over time the majority of flows are reported. Based on a configured sampling rate N, 1 out of N packets is captured and sent to the collector. This type of sampling does not provide a 100 percent accurate result in the analysis, but it does provide a result with quantifiable accuracy. A user-configured polling interval defines how often sFlow data for a specific interface are sent to the collector, but an sFlow agent can also schedule polling.
For the EX9200 switch and MX Series routers, we recommend that you configure the same sample rate for all the ports in a line card. If you configure different sample rates, the lowest value is used for all ports on the line card.
If the mastership assignment changes in a Virtual Chassis setup, sFlow technology continues to function.
Adaptive sampling is the process of monitoring the overall incoming traffic rate on the network device and providing intelligent feedback to interfaces to dynamically adapt the sampling rates on interfaces on the basis of traffic conditions. Adaptive sampling prevents the CPU from overloading and maintains the system at an optimum level, even when traffic patterns change on the interfaces. Whereas the sample rate is the configured number of egress or ingress packets out of which one packet is sampled, the adaptive sample rate is the maximum number of samples that should be generated per line card, that is, it’s the limit given to adaptive sampling. Sample load is the amount of data (or number of packets) moving across a network at a given point of time that is sampled. As you increase the sample rate, you decrease the sample load and vice versa. For example, suppose the configured sample rate is 2 (meaning 1 packet out of 2 packets is sampled), and then that rate is doubled, making it 4, or only 1 packet out of 4 packets is sampled.
You configure the adaptive sample rate, which is the maximum number of samples that should be generated per line card, at the [edit protocols sflow adaptive-sample-rate hierarchy level.
How Adaptive Sampling Works
Every few seconds, or cycle, the sFlow agent collects the interface statistics. From these aggregated statistics, an average number of samples per second is calculated for the cycle. The cycle length depends on the platform:
Every 12 seconds for EX Series and QFX5K switches and MX Series and PTX Series routers
Every 5 seconds for QFX Series switches other than QFX5K
If the combined sample rate of all the interfaces on an line card exceeds the adaptive sample rate, a binary backoff algorithm is initiated, which reduces the sample load on the interfaces. Adaptive sampling doubles the sample rate on the affected interfaces, which reduces the sampling load by half. This process is repeated until the CPU load due to sFlow on a given line card comes down to an acceptable level.
Which interfaces on an line card participate in adaptive sampling depends on the platform:
For MX Series routers and EX Series switches, the sample rates on all the interfaces on the line card are adapted.
For PTX Series routers and QFX Series switches, only the five interfaces with the highest sample rates on the line card are adapted.
On a QFabric system, sFlow technology monitors the interfaces on each node device as a group, and implements the binary backoff algorithm based on the traffic on that group of interfaces.
For all platforms, the increased sampling rates remain in effect until one of the following conditions is achieved:
The device is rebooted.
A new sample rate is configured.
If you have enabled the adaptive sampling fallback feature and, because of a traffic spike, the number of samples increases to the configured sample-limit-threshold, then the adaptive sampling rate is reversed. See Adaptive Sampling Fallback.
Adaptive Sampling Fallback
The adaptive sampling fallback feature, when configured and after adaptive sampling has taken place, uses a binary backup algorithm to decrease the sampling rate (thus, increasing the sampling load) when the number of samples generated is less than the configured sample-limit-threshold value, without affecting normal traffic.
Starting in Junos OS Release 18.3R1, for EX Series switches, Junos OS supports the adaptive sampling fallback feature. Starting in Junos OS Release 19.1R1, for MX Series, PTX Series, and QFX Series devices, Junos OS supports the adaptive sampling fallback feature.
Adaptive sampling fallback is disabled by default. To enable this feature, include the fallback and adaptive-sample-rate sample-limit-threshold options in the [edit protocols sflow adaptive-sample-rate] hierarchy level.
After adaptive sampling has taken place and the line card is underperforming—that is, the number of samples generated in a cycle are less than the configured value for the sample-limit-threshold statement—for five continuous cycles of adaptive sampling, the adapted rate is reversed. If the reverse adaptation has happened and the number of samples generated in a cycle is less than half of the current adapted rate again (and, therefore, for five continuous cycles), another reverse adaptation can happen.
Reverse adaptation does not occur if the interfaces are already at the configured rate.
Adaptive Sampling Limitations
The following are limitations of the adaptive sample feature:
On standalone routers or standalone QFX Series switches, if you configure sFlow on multiple interfaces and with a high sampling rate, we recommend that you specify a collector that is on the data network instead of on the management network. Having a high volume of sFlow traffic on the management network might interfere with other management interface traffic.
On routers, sFlow does not support graceful restart. When a graceful restart occurs, the adaptive sampling rate is set to the user-configured sampling rate.
On a rate-selectable line card (which supports multiple speeds), interfaces with the highest sample count are selected for adaptive sampling fallback. The backup algorithm selects those interfaces on which the adaptive sampling rate is increased the maximum number of times and then decreases the sampling rate on each of those interfaces every five seconds. However, on a single-rate line card, only one sample rate is supported per line card, and the adaptive sampling fallback mechanism backs up the sampling rate on all the interfaces of the line card.
sFlow Agent Address Assignment
The sFlow collector uses the sFlow agent’s IP address to determine the source of the sFlow data. You can configure the IP address of the sFlow agent to ensure that the agent ID of the sFlow agent remains constant. If you do not specify the IP address to be assigned to the agent, an IP address is automatically assigned to the agent based on the following order of priority of interfaces configured on the device:
Routers and EX Series Switches
QFX Series Devices
If a particular interface is not configured, the IP address of the next interface in the priority list is used as the IP address for the agent. Once an IP address is assigned to the agent, the agent ID is not modified until the sFlow service is restarted. At least one interface has to be configured for an IP address to be assigned to the agent. When the agent’s IP address is assigned automatically, the IP address is dynamic and changes when the device reboots.
On the QFabric system, the following default values are used if the optional parameters are not configured:
Agent ID is the management IP address of the default partition.
Source IP is the management IP address of the default partition.
In addition, the QFabric system subagent ID (which is included in the sFlow datagrams) is the ID of the node group from which the datagram is sent to the collector.
sFlow data can be used to provide network traffic visibility information. You can explicitly configure the source IP address to be assigned to the sFlow datagrams. If you do not explicitly configure the IP address, the IP address of any of the configured Layer 3 network interfaces is used as the source IP address. If a Layer 3 IP address is not configured, then the agent IP address is used as the source IP address.
sFlow Limitations on Routers
On routers, limitations of sFlow traffic sampling include the following:
Trio chipset cannot support different sampling rate for each family. Hence, only one sampling rate can be supported per line card.
Adaptive load balancing is applied per line card and not for per interface under the line card.
Routers support configuration of only one sampling rate (inclusive of ingress and egress rates) on an line card. To support compatibility with the sflow configuration of other Juniper Networks products, the routers still accept multiple rate configuration on different interfaces of the same line card. However, the router programs the lowest rate as the sampling rate for all the interfaces of that line card. The (show sflow interfaces) command displays the configured rate and the actual (effective) rate. However, different rates on different line cards is still supported on Juniper Networks routers.
sFlow Limitations on Switches
On the QFX Series, limitations of sFlow traffic sampling include the following:
sFlow sampling on ingress interfaces does not capture CPU-bound traffic.
sFlow sampling on egress interfaces does not support broadcast and multicast packets.
Egress samples do not contain modifications made to the packet in the egress pipeline.
If a packet is discarded because of a firewall filter, the reason code for discarding the packet is not sent to the collector.
On EX9200 switches and QFX Series switches except the QFX10K switches, true OIF (outgoing interface) is not supported with sFlow.
The out-priority field for a VLAN is always set to 0 (zero) on ingress and egress samples.
On QFX5100 standalone switches and the QFX Series Virtual Chassis (including mixed QFX Series Virtual Chassis), egress firewall filters are not applied to sFlow sampling packets. On these platforms, the software architecture is different from that on other QFX Series devices—sFlow packets are sent by the Routing Engine (not the line card on the host) and do not transit the switch. Egress firewall filters affect data packets that are transiting a switch, but do not affect packets sent by the Routing Engine. As a result, sFlow sampling packets are always sent to the sFlow collector.
EX9200 switches support configuration of only one sampling rate (inclusive of ingress and egress rates) on an FPC (or line card). To support compatibility with the sflow configuration of other Juniper Networks products, EX9200 switches still accept multiple rate configuration on different interfaces of the same FPC. However, the switch programs the lowest rate as the sampling rate for all the interfaces of that FPC. The (show sflow interfaces) command displays the configured rate and the actual (effective) rate. However, different rates on different FPCs is still supported on EX9200 switches.