Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

IGP Anomaly Detection Overview

This topic provides an overview of IGP Anomaly Detection and describes its functionality, scaling behavior and limitations.

Overview

IGP Anomaly detection identifies anomalies in IGP functioning in the network based on data collected from onboarded devices. Routing Director collects IGP Link State Database (LSDB) and IGP State information periodically from onboarded devices. It detects deviations related to prefixes, adjacencies, and nodes. The collected data is used to identify abnormal patterns or inconsistencies in the IGP topology such as flaps, duplicate entries, sudden changes in configured parameters, and so on. These detected anomalies are displayed on the IGP heatmap (Observability > Routing > Route Topology) for easy visualization and troubleshooting.

The anomaly detection primarily focuses on identifying inconsistencies or deviations in IGP data such as link states, adjacency information, and prefix advertisements. Each anomaly is associated with a predefined detection rule that monitor changes in LSDB objects (prefixes, adjacency, and nodes). These rules analyze the state data and when violated, Routing Director raises events as anomaly messages.

The anomalies detected are represented on the IGP heatmap, where each node and link is color coded as per the severity of the anomaly. This color classification enables easy identification of issues that require attention. This heatmap helps you to analyze the operational state of the network. Routing Director collects IGP LSDB and IGP state information periodically from the subscribed devices. The collected data is used to analyze IGP issues such as flaps, duplicate entries, sudden changes in configured parameters.

The alert severity levels, and their corresponding colors are:

  • Critical—Red

  • Major—Orange

  • Minor—Yellow

  • Info—White

Scale and Limitations

The IGP Anomaly detection supports monitoring networks with up to 1000 nodes (devices) within a single IGP domain. The LSDB data can be collected from a maximum of two devices, as each device's LSDB contains complete topology information. The IGP state can be collected from up to 200 devices to monitor protocol-level behavior.

In addition to LSDB, Routing Director collects IGP State data to analyze real-time behavior. This can be subscribed for up to 200 devices.

Warning:
  • While the supported scale is 1000 nodes, you will not be blocked from exceeding the LSDB limit of 1000 modes. Be advised to stay within the limit to prevent performance degradation.

  • Since LSDB consumes significant resources, you are recommended to subscribe LSDB streaming from lightly loaded CPU devices. The IGP state can be subscribed from super core, core, aggregation, or pre-aggregation devices.

Anomaly Categories

Routing Director continuously monitors LSDB and IGP state data to detect irregularities in network behavior. Table 1 lists the anomaly types, their severity, description, and corresponding messages generated when such anomalies are observed.

Table 1: Types of anomalies detected and their details
Anomaly details Description
Category: Flaps

Prefix Flaps

Message

Observed Prefix <x.x.x.x> originated by <systemID> from instance- DEFAULT was down at: <date:time UTC>

Severity

Info

Description

This anomaly indicates that a prefix went down or came up in the ISIS domain.

It can occur during regular maintenance activities such as link upgrades or when a node is restarted. If the prefix is external and redistributed from another domain, the flap might occur due to maintenance or configuration updates in that external domain.

Adjacency Flaps

Message

Observed Adjacency Flaps on: <origin hostname> with neighbor <host name> from instance- DEFAULT started advertising again at <date:time UTC>

Severity

Info

Description

This anomaly indicates that an ISIS adjacency went down or came up repeatedly.

It can occur during routine maintenance, link changes, or node upgrades. At times, it happens due to ISIS Hello messages expiring or BFD sessions going down. Most of the time, BFD or Hello message timeouts occur when there is CPU congestion on the local or neighboring node. If the ISIS adjacency is running on top of a third-party network, it might be due to congestion in that network. It also occurs when the scale of configured adjacencies may not be supported with configured intervals.

Node Flaps

Message

Observed Node Flap on: <hostname> from instance-DEFAULT at time: <start time:end time> UTC

Severity

Info

Description

This anomaly indicates that all adjacencies on a node were lost. Even if the node is not down, the anomaly appears because the node has lost all connectivity to the network and, becomes invisible to other parts of the network. This could happen during scheduled maintenance or node upgrades. In some cases, it can also occur due to software bugs in the routing process.

Category: Duplicate Detection
Duplicate Hostname

Message

Duplicate entry hostname:<Hostname>, <systemID1> from instance: DEFAULT found at:<start date, time: end date, time>UTC

Severity

Critical

Description

This anomaly indicates that two nodes are configured with the same hostname. We recommend that network administrators assign a unique hostname to each device to avoid confusion in identification and reporting.

Duplicate Router ID

Message

Duplicate entry router id:<ipv4-te-router-id> <systemID> and <systemID>from instance: DEFAULT found at: <date:time>UTC

Severity

Critical

Description

This anomaly indicates that two nodes are configured with the same router ID. We recommend that network administrators configure unique router IDs to maintain consistency and prevent conflicts during route calculations.

Prefix-SID Conflicts

Message

Duplicate entry prefix_sid_state_value:<number> <Prefix 1 and Prefix 2> from instance: DEFAULT found at: <start date start time -> end date end time> UTC

Severity

Critical

Description

This anomaly indicates that two prefixes are configured with the same Segment ID (SID). We recommend that network administrators configure a unique SID for each prefix to prevent routing conflicts and maintain predictable path forwarding.

Missing Anycast SIDs

Message

From Instance- DEFAULT Missing Anycast SID detected for Prefix <Prefix> from host-name <Hostname> found at: <start date start time -> end date end time> UTC

Severity

Critical

Description

This anomaly indicates that an anycast prefix is configured on two nodes, but only one node has the SID configured. Network Administrator are advised to configure the same SID on all nodes where the anycast prefix is defined for consistent routing behavior.

Category: Dynamic Thresholding

Sudden Prefix Increase/Decrease

Message

From instance-DEFAULT Sudden Prefixes increase/decrease reported by <Hostname> Min: <number> Max: <number> at: <date:time> UTC , Number of Prefixes decreased to number , Normalized at: <date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of prefixes advertised by a node has increased or decreased suddenly. This might occur due to ISIS redistribution policies allowing a large number of BGP prefixes into ISIS because of misconfigured route policies.

Sudden Link Increase/Decrease

Message

From instance-DEFAULT Sudden Links increase/decrease reported at: [<date:time> UTC] , Number of Links decreased to <number>, Normalized at:<date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of links advertised by a node has changed abruptly. This issue may occur when several links or adjacencies go down simultaneously. In some cases, ISIS DDoS protection on neighboring devices may drop packets if too many ISIS packets are generated. In addition, if the default DDoS threshold values are too low for adjacency scale of the device, similar instability may occur.

Sudden Node Increase/Decrease
Message

From instance-DEFAULT Sudden Nodes increase/decrease reported at:[<date:time> UTC] , Number of Nodes decreased to <number>, Normalized at:<date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of nodes advertised in the network has increased or decreased suddenly. Currently, this anomaly is available only in API responses and alerts, and not on the IGP heatmap. It may occur when multiple nodes go down simultaneously due to software bugs or instability in the control plane.

SPF Run Increase

Message

Sudden SPFrun increase/decrease reported by <neighbor systemID> Min:<number> Max:<number> at: <date:time UTC>, Number of SPFrun decreased to <number> , Normalized at:<date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of SPF (Shortest Path First) calculations triggered on a node has increased suddenly. It usually occurs due to increased network activity, such as frequent link or metric changes. It can also result from software bugs that trigger unnecessary SPF runs.

LSP Retransmit Increase

Message

Sudden lspretran increase/decrease reported by Hostname from interface:<interface> name Min:<number> Max:<number> at: <date:time UTC> , Number of lspretran increased to <number> , Normalized at:<date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of LSP retransmissions triggered on a node has increased suddenly. It occurs when neighboring nodes fail to receive LSPs due to packet drops or when acknowledgments are delayed because of high CPU utilization. This is an important early warning of potential issues in the ISIS protocol that requires prompt administrative attention.

Repeated LSP Regeneration

Message

Sudden lsp refresh increase/decrease reported by <Hostname> Min:<number> Max:<number> at:<date:time UTC>, Number of lsp refresh increased to <number>, Normalized at: <date:time> UTC

Severity

Critical

Description

This anomaly indicates that the number of LSP regenerations triggered on a node has increased suddenly. It may occur due to increased activity on the node, such as bandwidth updates, link flaps, or adjacency flaps. Frequent regeneration can indicate instability or excessive updates in the network.