What are Alarms and Events?
An event is an occurrence that happens at a specific point of time. These indicate that a significant system event has occurred but is not an ongoing persistent event.
An alarm is an indication that the system is in a state that may require user intervention to resolve.
An alarm is stateful, an event is a one-off occurrence. For example, an alarm is usually associated with two events, an "ADD" when it is created and a "CLEAR" when it goes away.
How Do I Find Alarms?
You can find alarms in the PCLI by typing:
The web client will have a badge icon on the bell on the left panel that indicates the total number of alarms. Additionally, the header on the Dashboard will enumerate the total alarms into one of three categories: critical, major and minor. Both the bell and the critical, major and minor are links to the alarm page which filters alarms respectively.
How Do I Find Events?
You can find events in the PCLI by typing:
Within the web client, the Event History link will bring you to a page where all events can be displayed, or by applying varying level of filters, restrict the messages to only a particular router or event type.
In large deployments, there may be a large number of events on a conductor. It may be difficult to wade through the pages of events if you know you are looking for a particular event within a window of time. The
show events command supports filtering events by time when specifying starting and ending time ranges. The syntax for time follows the same format of the event timestamps.
Note the trailing
Z. Most systems are configured to be in UTC time. If the trailing
Z was absent, the time filter would be restricted to the local time zone
show events alarm from 2020-03-30T22:00:00Z to 2020-03-30T23:59:59Z router AAPDENCOPOD4
Contents of Alarms and Events
|dateTime||This is the date and time that the event occurred. The format of the value followsthe ISO 8601 standard|
|node||The system within the 128T which produced the event|
|process||The process within the node which produced the event|
|source||The name of the entity which was the originator of the alarm. When the topic is a network-interface, this would be the name of the network-interface.|
|category||This is the alarm type. Each category has a specific message format:|
• system Related to the system, e.g. CPU, memory, etc.
• process Related to an internal software process
• interface/network-interface Related to an interface on the 128T (up, down,etc.)
• platform Related to low-level events that aren't necessarily derived from the machine; e.g. security keys
• peer Related to connectivity between 128T routers
• platform-state Sourced from the stats infrastructure
• redundancy Related to high availability behavior; e.g. a failover or leadership change
• giid Related to an interface that is part of a redundant pair (giid is an interface's "global ID")
• asset An alarm sourced by an asset (a managed node) that is dervied from Automated Provisioner
|severity||Alarms can be categorized in one of four severities: critical, major, minor and info. These severity levels can be used to filter alerts based on one of these levels. The default severity level is info, which shows all alarms.|
• critical The condition affects service
• major Immediate action is required
• minor Minor warning conditions
• info No action is required
|message||Descriptive text regarding the nature of the alarm|
When a 128T is put into “Maintenance Mode” all alarms for that 128T will be “shelved”. Shelved alarms will continue to be monitored by the system but will not be presented on the standard UI. The state of shelved alarms can be optionally viewed by issuing: