ECMP Imbalance (Fabric Interfaces) Probe

_images/ecmp_imbalance.png

Found at /predefined_probes/fabric_ecmp_imbalance

It first identifies all the fabric interfaces on all deployed and operational leafs. Fabric interfaces are defined as those facing spines, which also satisfy the constraint that they are deployed and operational. It then collects samples for each, generates time series and calculates average traffic across configurable time interval. It then calculates imbalance between traffic averages for fabric interfaces on each leaf by calculating standard deviation. It then checks if this imbalance is within acceptable range (which is also configurable) and if not, how long has it been outside acceptable range and if this time exceeds configurable interval it raises anomaly. It also creates time series for this anomaly so that one can observe recent history. Probe also calculates how many systems are imbalanced in total and if this number is out of configurable range, it raises the system level anomaly and keeps track about the history for this system level anomaly.

“leaf fabric interface traffic” processor

Purpose: wires in interface traffic samples (measured in bytes per second) from each spine facing interface on each leaf.

Outputs of “leaf fabric interface traffic” processor

‘leaf_fabric_int_traffic’: set of traffic samples (for each spine facing interface on each leaf). Each set member has the following keys to identify it: label (human readable name of the leaf), system_id (id of the leaf system, usually serial number), interface (name of the interface).

“leaf fabric interface traffic average” processor

Purpose: Calculate average traffic during period specified by average_period facade parameter. Unit is bytes per second.

Outputs of “leaf fabric interface traffic average” processor

‘leaf_fab_int_tx_avg’: set of traffic average values (for each spine facing interface on each leaf). Each set member has the following keys to identify it: label (human readable name of the leaf), system_id (id of the leaf system, usually serial number), interface (name of the interface).

“leaf fabric interface history” processor

Purpose: create recent history time series out of traffic samples from the leaf_fabric_int_traffic output. In terms of the number of samples, the time series will hold the smaller of: 1024 samples or samples collected during the last ‘total_duration’ seconds (facade parameter). Samples unit is bytes per second.

Outputs of “leaf fabric interface history” processor

leaf_fab_int_time_series: set of traffic samples time series (for each spine facing interface on each leaf). Each set member has the following keys to identify it: label (human readable name of the leaf), system_id (id of the leaf system, usually serial number), interface (name of the interface). Samples unit is bytes per second.

“leaf fabric interface std-dev” processor

Purpose: calculate standard deviation for a set consisting of traffic averages for each spine facing interface on a given leaf. Grouping per leaf is achieved using ‘group_by’ property set to ‘system_id’.

Outputs of “leaf fabric interface std-dev” processor

leaf_fab_int_std_dev: set of values, each indicating standard deviation (as a measure of ECMP imbalance) for traffic averages for each spine facing interface on a given leaf. Each set member has system_id key to identify leaf whose ECMP imbalance the value represents.

“live ecmp imbalance” processor (spine)

Purpose: Evaluate if standard deviation between spine facing interfaces on each leaf is within acceptable range. In this case acceptable range is between 0 and std_max facade parameter (in bytes per second unit).

Outputs of “live ecmp imbalance” processor (spine)

‘live_ecmp_imbalance’: set of true/false values, each indicating if standard deviation (as a measure of ECMP imbalance) for traffic averages for each spine facing interface on a given leaf is within acceptable range. Each set member has system_id key to identify leaf whose ECMP imbalance the value represents.

“sustained ecmp imbalance” processor (spine)

Purpose: Evaluate if standard deviation between spine facing interfaces on each leaf has been outside acceptable range, (as defined by ‘live ecmp imbalance’ processor) for more than ‘threshold_duration’ seconds during last ‘total_duration’ seconds. These two parameters are part of facade specification.

Outputs of “sustained ecmp imbalance” processor (spine)

‘system_tx_imbalance’: set of true/false values, each indicating if standard deviation (as a measure of ECMP imbalance) for traffic averages for each spine facing interface on a given leaf has been outside acceptable range for more than specified period of time. Each set member has system_id key to identify leaf whose ECMP imbalance the value represents.

“ecmp imbalance anomaly” processor

Purpose: Export sustained ecmp imbalance when true as an anomaly for each system.

Outputs of “ecmp imbalance anomaly” processor

‘ecmp_imbalance_anomaly’: set of true/false values, each indicating if standard deviation (as a measure of ECMP imbalance) for traffic averages for each spine facing interface on a given leaf has been outside acceptable range for more than specified period of time. Each set member has system_id key to identify leaf whose ECMP imbalance the value represents.

“anomaly acumulate” processor

Purpose: Create time series showing ecmp anomaly being raised and cleared for each system under consideration. This time series may contain up to ‘anomaly_history_count’ anomaly state changes.

Outputs of “anomaly acumulate” processor

‘anomaly accumulate’: Time series showing ecmp anomaly being raised and cleared for each system under consideration. This time series may contain up to ‘anomaly_history_count’ anomaly state changes.

“systems imbalanaced count” processor

Purpose: Count how many systems have ecmp imbalance anomaly true at any instant in time.

Outputs of “systems imbalanaced count” processor

‘systems_imbalance_count’: Number of systems with ecmp imbalance.

“imbalanced system count out of range” processor

Purpose: Evaluate if the number of imbalanced systems is within acceptable range, which in this instance means less than ‘max_systems_imbalanced’ value which is a facade parameter

Outputs of “imbalanced system count out of range” processor

‘imbalanced_system_count_out_of_range’: Boolean indicating if the number of imbalanced systems is within accepted range, i.e. less than ‘max_systems_imbalanced” which is a facade parameter

“imbalanced system count out of range anomaly” processor

Purpose: Export as anomaly when the number of imbalanced systems is not within acceptable range, where acceptable range is defined as less than ‘max_systems_imbalanced’ value.

Outputs of “imbalanced system count out of range anomaly” processor

‘imbalanced_system_count_out_of_range_anomaly’: Boolean indicating if the number of imbalanced systems is within accepted range, i.e. less than ‘max_systems_imbalanced” which is a facade parameter

“imbalanced system count anomaly history” processor

Purpose: Create time series showing imbalanced system count out of range anomaly being raised and cleared. This time series may contain up to ‘system_imbalance_history_count’ anomaly state changes.

Outputs of “imbalanced system count anomaly history” processor

‘imbalanced_system_count_anomaly_history’: time series showing imbalanced system count out of range anomaly being raised and cleared. This time series may contain up to ‘system_imbalance_history_count’ anomaly state changes.