T1 Troubleshooting Process
The 128T software is often configured to leverage legacy T1/E1 circuits to connect to provider MPLS networks. This document contains troubleshooting techniques available to 128T administrators, to confirm whether a T1 has issues prior to escalating to service provider, and to distinguish between a T1 issue and a downstream issue.
Confirm Interface Status
The PCLI command
show device-interface shows the status of the T1 interface, including its administrative status and its operational status. For functioning T1 interfaces, the interface state will be
up both administratively and operationally.
Here is an example from a healthy system:
Here is an example from a system with T1 problems:
The state of
unknown represents a T1 that has never successfully come online since 128T system start. The initial state of a T1 interface on the 128T platform is
unknown, until it transitions to
up. When a T1 is
unknown, it can be treated as out of service.
Confirm Peer Status
Typically, a site will have multiple upstream peers that it connects with over any given transport circuit; for example, a branch location will connect to multiple head ends on each of its T1 circuit, its broadband circuit, LTE connection, etc. These individual connections are referred to as "peer paths," and are periodically measured using BFD packets. If some but not all of the peer paths are down on a given circuit, or only a subset are flapping (repeatedly transitioning between up and down), the issue is most likely related to a downstream issue and not a local T1 issue. As an additional point of investigation, it is encouraged to look at other branch locations that peer with the same head ends/peers to see if there is any correlation to down/flapping peer paths.
Here is an example of a system with working peers:
In some deployments there will be "standby" paths to router peers. These may show as either
standby in the output of
show peers. In the sample output above these are displayed as
standby for the backup MPLS link.
Here is an example of a system with issues communicating to peers over its T1 circuit:
Build a Timeline
Based on when your monitoring system detects a failure, or your users report connectivity problems, start building a timeline of events. This is conventional wisdom in network troubleshooting: piece together the information from logs, event history, and alarms, to compile a timeline that can be traced back to the trigger of the problem. This is how we will arrive at the root cause of the issue.
Log into the local router's PCLI to view active alarms and event history.
The PCLI output is often very wide and may cause lines to wrap making it difficult to read on narrow screens. Ensure your screen is wide enough to prevent line wrap.
- First, check the current system time from within the PCLI:
- Choose a timeframe (e.g., 24 hours) offset from the current time, and use the
show events alarmcommand to list all T1 events that have occurred in that timeframe by passing the
fromargument. The timezone of the value passed to
fromin the example below is assumed to be the same as that of the system.
- Compile each of these events into a timeline. Each event that is displayed will indicate in the
Event Typecolumn whether it is a new event (the type is
add), or clearing a prior event (the type is
clear). Furthermore, each event will fall into one of two Categories:
Interpreting the Data
network-interface category is used when the interface transitions to up or down, either administratively (i.e., disabled or enabled by an administrator), or operationally (the physical circuit).
peer category reflects the ability for the 128T software to reach remote devices over that circuit.
network-interface is down, either operationally or administratively, this will obviously impact all traffic on that interface. For circumstances when the
network-interface is operationally down, proceed to the Circuit Troubleshooting section of this document.
For peers, it is important to identify if the issue affects just one peer or all of the peers leveraging the T1 interface. When the issue is limited to one peer, it is typically a downstream issue that should be escalated to the circuit provider. When the issue affects all peers, it is typically a circuit issue.
show device-interface will show the details of the T1 interface activity, including errors, alarms, and signal strength.
Here is an example of a clean T1 circuit.
Here is an example of a misbehaving T1; note the section on Performance Monitoring Counters at the end of the output:
The 128 Technology solution includes the Sangoma E1/T1 card for interfacing to MPLS networks. For more information on the Framer alarms, the LIU alarms, and the Tx alarms, refer to the Sangoma Reference Guide for details.
Checking the Circuit Status
From Linux prompt on the node where the T1 card is installed, use the following command to confirm if the T1 card is observing circuit flapping. For this we'll use the
journalctl command, which shows log messages in the system's journal. We can restrict it to start and end times, and then pipe it to the
grep command to filter out everything but the messages related to
wanpipe (the Sangoma card).
In this example we can see that the T1 card (
wanpipe1) is reporting circuit loss at 18:50:23, restored six seconds later, down again eight seconds after that, and restored again in five seconds. This is a classic example of a "flapping" circuit. This circuit behavior will affect all peers, and cause all traffic to be migrated off of the circuit. This should correspond to interface failures in the output of
show events alarm.
These should be added to your troubleshooting timeline.
Check for T1 physical errors
We'll use the command
show device-interface node <name> name <interfaceName> to look at the error counters for the circuit.
Repeat this command several times to see if errors are incrementing, or if these counters are residual from an earlier incident.
Clearing the Counters
The 128T cannot reset Sangoma counters within its administrative interfaces. To clear the counters, you must follow these steps within the Linux host operating system.
The Sync Errors counter cannot be cleared.
- Use the command
ip netnsto identify the Linux namespace used for the T1 interface. This will be of the format
In this example, the T1 namespace is
- Clear the performance counters using the command
ip netns exec <t1 namespace> wanpipemon -i w1g1 -c fpm
Confirm Presence of the Sangoma card
Use the Linux command
lspci to confirm that the card appears on the Linux system's PCI bus. Here is the expected result (note that the PCI address
04:04.0 may be different on your hardware platform):
If the card is not present, this is the output:
This is indicative of a failure to the Sangoma card or the host machine.
Confirm Status of the T1 Card
Use the command
wanrouter status to make sure the card is active and connected. The expected output should look something like this:
If the card or drivers are malfunctioning, it will look like this:
Failure in this scenario means that the T1 card is not loading properly. Try to restore connectivity to the card by powering off the entire host, waiting a short time, and powering it back on. If the failure persists, this system should be replaced.
Confirm the System can Query the Device
Use the command
wanrouter hwprobe to confirm communication between the host operating system and the Sangoma card. Expected output:
For failure cases, more information may be in the logfile
/var/log/wanrouter and the system journal:
journalctl command and look for messages related to
Confirm Layer3 PPP connection using ICMP
We'll use the old standby
ping to make sure we can reach the PPP peer. Use the
ping command, specifying the egress interface, and put in the service provider's Provider Edge (PE) IP address:
If the ping request fails and all indications from the 128T and Sangoma diagnostics indicate an otherwise healthy system, escalate to the service provider for investigation on the PE equipment.
The Sangoma T1 Card with the latest firmware will automatically loop back on detection of a loop signal from the service provider. However, there may be cases where you will need to manually loop up the card. This is done through the Linux shell using
When T1 is placed in loopback, the interface is no longer usable for remote administration, conductor connectivity, traffic forwarding, etc. This should only be done during scheduled maintenance, or in critical situations. Because this can interfere with remote administration, it is essential another path be available so that the Node can be accessed to enter the command to disable T1 loopback (thus restoring normal service) issued locally.
The output of
show device-interface will indicate a status of Admin: Up and Oper: Down while the loopback is engaged. You will need to use the Linux shell to query the card for statistics while it is unavailable to 128T.
Performance counters can be retrieved using the Linux command
wanpipemon -i w1g1 -c Ta
All of the commands in this section must be run from within the T1's namespace in Linux. You can either precede all of the commands with
ip netns exec t1-ns-<number>, or use the command
ip netns exec t1-ns-<number> bash to start a shell within that namespace. If using the latter technique, all commands will be run from within the namespace; use the
exit command to leave that shell and return to the default namespace.
For the examples in this section, we will not use individual
ip netns exec commands for each sample output. These were run from within a
bash shell in the namespace.
To enable a manual loopback:
To confirm the loopback status:
To disable line loopback mode:
To confirm loopback is disabled:
Standard T1 Configuration File
Below is the standard configuration file included with the 128T software. This configuration is generated by the 128T software and should not be modified, it is provided here for reference only.
The file is located at
Escalating to the Service Provider
If the troubleshooting exercise leads you to believe the issue is with the service provider's circuit, we recommend collecting the following information prior to escalating.
- Confirm the circuit is plugged in and the node has power
- Status of active T1
- Admin: up/down
- Operational: up/down/unknown
- T1 State: up/down
- T1 Flags: RUNNING versus Missing
- T1 Card Present in system Yes/No
- Alarm Analysis: (Provide Command Output)
- T1 interface flapping: Yes/No
- T1 peers flapping (all): Yes/No
- BGP flapping: Yes/No (Correlated to T1 Events?)
- Interface Error Analysis: (Provide Command Output)
- KNI errors accumulating: Yes/No
- Layer 3 Errors accumulating: Yes/No
- T1 Performance Errors Accumulting : Yes/No
- T1 RX signal in spec: Yes/No
- Can ping PPP peer (Provider Edge) IP: Yes/No
- The timeline of events observed by the 128T and uncovered during your analysis