Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Adaptive Sampling Overview

Adaptive sampling is the process of monitoring the overall incoming traffic rate on the network device and providing intelligent feedback to interfaces to dynamically adapt the sampling rates on interfaces on the basis of traffic conditions. Adaptive sampling prevents the CPU from overloading and maintains the system at an optimum level, even when traffic patterns change on the interfaces. Whereas the sample rate is the configured number of egress or ingress packets out of which one packet is sampled, the adaptive sample rate is the maximum number of samples that should be generated per line card, that is, it’s the limit given to adaptive sampling. Sample load is the amount of data (or number of packets) moving across a network at a given point of time that is sampled. As you increase the sample rate, you decrease the sample load and vice versa. For example, suppose the configured sample rate is 2 (meaning 1 packet out of 2 packets is sampled), and then that rate is doubled, making it 4, or only 1 packet out of 4 packets is sampled.

You configure the adaptive sample rate, which is the maximum number of samples that should be generated per line card, at the [edit protocols sflow adaptive-sample-rate] hierarchy level.

To ensure sampling accuracy and efficiency, QFX Series devices use adaptive sFlow sampling. Adaptive sampling monitors the overall incoming traffic rate on the device and provides feedback to the interfaces to dynamically adapt their sampling rate to traffic conditions. The sFlow agent reads the statistics on the interfaces every 5 seconds and identifies five interfaces with the highest number of samples. On a standalone switch, when the CPU processing limit is reached, a binary backoff algorithm is implemented to reduce the sampling load of the top five interfaces by half. The adapted sampling rate is then applied to those top five interfaces.

Using adaptive sampling prevents overloading of the CPU and keeps the device operating at its optimum level even when there is a change in traffic patterns on the interfaces. The reduced sampling load is used until:

  • You reboot the device.

  • You configure a new sampling rate.

  • The adaptive sampling fallback feature, if configured, increases the sampling load because the number of samples generated is less than the configured threshold.

If a particular interface is not configured, the IP address of the next interface in the priority list is used as the IP address for the agent. Once an IP address is assigned to the agent, the agent ID is not modified until the sFlow service is restarted. At least one interface has to be configured for an IP address to be assigned to the agent.

Considerations

On the QFX Series, limitations of sFlow traffic sampling include:

  • sFlow sampling on ingress interfaces does not capture CPU-bound traffic.

  • sFlow sampling on egress interfaces does not support broadcast and multicast packets.

  • Egress samples do not contain modifications made to the packet in the egress pipeline.

  • If a packet is discarded because of a firewall filter, the reason code for discarding the packet is not sent to the collector.

  • The out-priority field for a VLAN is always set to 0 (zero) on ingress and egress samples.

  • If Aggregated Ethernet (AE) interfaces are VLAN tagged and egress sampling is enabled, you receive inaccurate next-hop details.

  • You cannot configure sFlow monitoring on a link aggregation group (LAG), but you can configure it individually on a LAG member interface.

  • On QFX10000 Series switches, for a set of ports in a multicast group, since the actual sampling happens in the ingress pipeline for egress packets, the minimum of the configured sFlow rate or the most aggressive sample rate among those ports is used for sampling across all ports in that group.

  • Starting from Junos OS Release 19.4 and later, on QFX10000 Series switches, if the destination port of a sampled UDP packet is 6635 and the packet does not include a valid MPLS header, the flow sampled packet gets corrupted or truncated. The actual packet is forwarded.

  • On QFX10000 Series standalone switches and the QFX Series Virtual Chassis (with QFX3500 and QFX3600 switches), egress firewall filters are not applied to sFlow sampling packets. On these platforms, the software architecture is different from that on other QFX Series devices, and sFlow packets are sent by the Routing Engine (not the line card on the host) and are not transiting the switch. Egress firewall filters affect data packets that are transiting a switch but do not affect packets sent by the Routing Engine. As a result, sFlow sampling packets are always sent to the sFlow collector.

How Adaptive Sampling Works

Every few seconds, or cycle, the sFlow agent collects the interface statistics. From these aggregated statistics, an average number of samples per second is calculated for the cycle. The cycle length depends on the platform:

  • Every 12 seconds for EX Series and QFX5K switches and MX Series and PTX Series routers

  • Every 5 seconds for QFX Series switches other than QFX5K

If the combined sample rate of all the interfaces on an line card exceeds the adaptive sample rate, a binary backoff algorithm is initiated, which reduces the sample load on the interfaces. Adaptive sampling doubles the sample rate on the affected interfaces, which reduces the sampling load by half. This process is repeated until the CPU load due to sFlow on a given line card comes down to an acceptable level.

Which interfaces on an line card participate in adaptive sampling depends on the platform:

  • For MX Series routers and EX Series switches, the sample rates on all the interfaces on the line card are adapted.

  • For PTX Series routers and QFX Series switches, only the five interfaces with the highest sample rates on the line card are adapted.

For all platforms, the increased sampling rates remain in effect until one of the following conditions is achieved:

  • The device is rebooted.

  • A new sample rate is configured.

If you have enabled the adaptive sampling fallback feature and, because of a traffic spike, the number of samples increases to the configured sample-limit-threshold, then the adaptive sampling rate is reversed.

Adaptive Sampling Fallback

The adaptive sampling fallback feature, when configured and after adaptive sampling has taken place, uses a binary backup algorithm to decrease the sampling rate (thus, increasing the sampling load) when the number of samples generated is less than the configured sample-limit-threshold value, without affecting normal traffic.

Starting in Junos OS Release 18.3R1, for EX Series switches, Junos OS supports the adaptive sampling fallback feature. Starting in Junos OS Release 19.1R1, for MX Series, PTX Series, and QFX Series devices, Junos OS supports the adaptive sampling fallback feature.

Adaptive sampling fallback is disabled by default. To enable this feature, include the fallback and adaptive-sample-rate sample-limit-threshold options in the [edit protocols sflow adaptive-sample-rate] hierarchy level.

After adaptive sampling has taken place and the line card is underperforming—that is, the number of samples generated in a cycle are less than the configured value for the sample-limit-threshold statement—for five continuous cycles of adaptive sampling, the adapted rate is reversed. If the reverse adaptation has happened and the number of samples generated in a cycle is less than half of the current adapted rate again (and, therefore, for five continuous cycles), another reverse adaptation can happen.

Reverse adaptation does not occur if the interfaces are already at the configured rate.

Adaptive Sampling Limitations

The following are limitations of the adaptive sample feature:

  • On standalone routers or standalone QFX Series switches, if you configure sFlow on multiple interfaces and with a high sampling rate, we recommend that you specify a collector that is on the data network instead of on the management network. Having a high volume of sFlow traffic on the management network might interfere with other management interface traffic.

  • On QFX Series switches, if you override the adaptive sampling rate from the default 300 pps, you might observe a high CPU utilization.

  • On routers, sFlow does not support graceful restart. When a graceful restart occurs, the adaptive sampling rate is set to the user-configured sampling rate.

  • On a rate-selectable line card (which supports multiple speeds), interfaces with the highest sample count are selected for adaptive sampling fallback. The backup algorithm selects those interfaces on which the adaptive sampling rate is increased the maximum number of times and then decreases the sampling rate on each of those interfaces every five seconds. However, on a single-rate line card, only one sample rate is supported per line card, and the adaptive sampling fallback mechanism backs up the sampling rate on all the interfaces of the line card.