Troubleshooting
Logging
For troubleshooting it is generally helpful to check the following logs:
/var/log/apache/netrounds_access.log
: All HTTP requests made to the Control Center web interface./var/log/apache/netrounds_error.log
: Errors reported by Apache for HTTP requests towards the Control Center web GUI. Console output from the Control Center back-end is also in this file; by default, all logging is done by the console./var/log/syslog
: General system log./var/log/mail.log
: Email server log.dmesg
: Kernel log
It may also help to turn on additional logging in
/etc/netrounds/netrounds.conf
by changing the line
LOGGING['handlers']['console']['level'] = 'ERROR'
to
LOGGING['handlers']['console']['level'] = 'DEBUG'
Another way to show logging for the Paragon Active Assurance callexecuter is to use the
journalctl
utility:
journalctl -u netrounds-callexecuter
Read more about journalctl
here: www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs.
A more detailed list of logs is found in the Operations Guide, chapter Monitoring the System.
Root Shell Option in Test Agent Local Console
You can perform additional troubleshooting under the guidance of Paragon Active Assurance by logging into the root shell of a Test Agent. This is done from the Test Agent local admin console.
How to access the local console is explained in the in-app help under "Test Agents" > "Configuring Test Agents from the local console".
- Navigate to Utilities and select Root shell.
- Log in with password
onlyfordebugaccess
.
The root shell prompt now appears.
The root password can be changed within the root shell using the passwd
command. Alternatively, this can be done using the Change root
password option in the local console.
Any changes made to the Test Agent in the root shell may cause the Test Agent to malfunction and/or void the warranty. When in doubt, always consult Juniper staff before proceeding.
The subsections that follow deal with specific problems.
Problem: Applying a license fails
Suggested actions:
-
First try restarting the licensing service:
sudo systemctl restart netrounds-license-daemon
-
If that does not help, you can wipe all licenses and reapply them as described in the Operations Guide, chapter Deleting Licenses from the System.
Problem: Services cannot be accessed
Suggested actions:
-
Check service status with the following commands:
sudo systemctl status "netrounds-*" openvpn@netrounds sudo systemctl status apache2 openvpn@netrounds
-
Check that the host name resolves to the correct IP, for instance using
dig
orping
-
Check listening ports with
netstat -lan
-
Check network traffic with
tcpdump
Problem: No data collected
- Possible cause: NTP time sync problem. This will introduce a time offset in the server-generated result views, so that it may take a while until any results appear in these views.
- Possible cause: No free disk space.
Problem: Data collection is slow, or data loss occurs
- Possible cause: Tests and monitors are queuing up so that data collection is delayed.
Run the command
ncc status
and check the value of scheduled_call_latency
, which indicates the length
of the queue. See the Operations Guide, chapter Tuning the System, section
"Control Center" for further information and a suggested remedy.
Problem: ncc migrate command fails
- Possible cause: Zookeeper has gone down.
If the ncc migrate
command fails with an error message saying "Unable to
create Kafka topics", this could be due to Control Center being unable to start the Zookeeper
server process. Kafka relies on Zookeeper and requires it to be running in order to operate
correctly.
There is a known issue where certain Zookeeper files are corrupted, which causes a Java EOF exception. To check the logs for this, run the following command
sudo tail -n 100 /var/log/syslog
and look for ERROR
entries related to Zookeeper:
2021-02-20 18:55:13,302 - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally java.io.EOFException
To fix this, delete all files of size zero that are located in the
/var/lib/zookeeper/version-2
folder. First identify these files:
/var/lib/zookeeper/version-2# du -sh * 0K log.1 8.0K log.1dd 8.0K log.210 ...
Then delete the corrupted file or files:
sudo rm /var/lib/zookeeper/version-2/log.1
Finally, restart Zookeeper:
sudo systemctl restart zookeeper
Problem: Services fail
- Possible cause: Kafka has gone down. This in turn may be due to Zookeeper issues, or to the system having run out of memory.
If this happens, services relying on Kafka will also fail and try to reboot until Kafka comes back online.
For example, if the netrounds-ta3-compat service fails, Test Agent Applications will be unable to register to Control Center.
You can check if Kafka is running with the following command:
sudo systemctl list-units --type=service | grep kafka
If Kafka has gone offline, then checking logs with
journalctl -u netrounds-test-agent-gateway.service
will return entries like the following:
Oct 17 11:17:05 example.com test-agent-gateway-service[23009]: {"level":"info","service":"test-agent-gateway- service","host":"example.com","src":"core","time":"2020-10-17T11:17:05.429473575Z","caller":"/app/cmd/server/ Oct 17 11:17:06 example.com systemd[1]: netrounds-test-agent-gateway.service: Main process exited, code=exited, status=1/FAILURE Oct 17 11:17:06 example.com systemd[1]: netrounds-test-agent-gateway.service: Unit entered failed state. Oct 17 11:17:06 example.com systemd[1]: netrounds-test-agent-gateway.service: Failed with result 'exit-code'. Oct 17 11:17:07 example.com systemd[1]: netrounds-test-agent-gateway.service: Service hold-off time over, scheduling restart. Oct 17 11:17:07 example.com systemd[1]: Stopped Netrounds TA3 Connection Service.
Problem: Kafka error "Too many open files"
- Detailed description: Kafka is repeatedly restarting and crashing after some seconds
have passed. In the journal log, Kafka displays a stacktrace with the error:
Too many open files
. - Cause: The default setting governing how many file descriptors Kafka is allowed to use is too low. The allowed number will be sufficient in the beginning but may become too small after some time.
To resolve:
-
Log in to Control Center and run
sudo systemctl edit kafka.service
-
An editor will open. Insert the following lines:
[Service] LimitNOFILE=65536
Save the file and exit.
Kafka should now successfully start and work correctly.
Problem: A Test Agent has successfully registered but does not come online (status icon remains red)
- Possible cause: The network does not allow the encrypted VPN connection to be established. Please check the configuration and logs of any firewall between the Test Agent and Control Center.
Detailed description:
The registration is done over HTTPS, while the connection setup after registration is done using OpenVPN on the same port. This can sometimes cause the network to allow the registration, but not the connection attempt.
- Possible cause: The clocks on the Test Agent and in Control Center are not in sync. Please check that both parties have NTP correctly configured, that their clocks are in sync with the NTP server, and that the NTP server clock in turn is also in sync.
Detailed description:
This description and the remedy that follows are applicable only when Test Agents are not coming online during a fresh installation of Control Center, and the time sync has been confirmed to be incorrect.
If the Test Agent clock is behind, then after the Test Agent registers, the TLS certificate signed by Control Center for the Test Agent will be invalid from the Test Agent's point of view until the Test Agent's clock has reached the time of signing according to the Control Center clock. Until that point, the Test Agent will not accept the certificate and will remain offline.
The TLS certificates are generated at the time of installing Control Center. If the clocks were not correct at this point, the certificates must be regenerated. Note that this also requires re-registration of all Test Agents. Follow these steps:
-
Make sure that the Control Center clock is in sync:
ntpq -np
-
Remove the old certificates:
rm /var/lib/netrounds/openvpn/*
-
Generate new certificates:
dpkg-reconfigure paa-test-agent-login
-
Register each Test Agent again under a different name.
-
Restart Control Center services:
sudo ncc services restart
Problem: Disk is running out due to incremental backups of TimescaleDB data
Detailed description:
Disk space is running out due to files being continuously created under
/var/lib/netrounds/rrd/timescaledb/pgbackrest/repo/archive/paa-metrics/
This is due to TimescaleDB incremental backups being turned on by default.
To resolve:
-
In the file
/var/lib/netrounds/rrd/timescaledb/data/postgresql.conf
, setarchive_mode = off
to turn off these backups.
-
Restart the
netrounds-timescaledb
service:sudo ncc services restart netrounds-timescaledb
Problem: Error when accessing Rest API URL and TypeError from apache2 service logs
If you are unable to access the REST API URL:
Access the apache2.log file.
sudo journalctl -b -u apache2 > apache2.log
Check whether the apache2.log file has the following exception for Traceback:
Traceback (most recent call last): File "/usr/lib/python3.10/dist-packages/restol/netrounds_restol.wsgi", line 9, in <module> application = create_app(load_config()) File "/usr/lib/python3.10/dist-packages/restol/restol app.py", line 158, in create app Limiter( TypeError: Limiter.__init__() got multiple values for argument 'key_func'
- Do one of the following:
If Traceback is not found in the apache2.log file, then file a ticket at support.juniper.net/support/requesting-support. You must attach the apache2.log file.
If Traceback is found in the apache2.log file"
Open /usr/lib/python3.10/dist-packages/restol/restol_app.py.
Replace the following:
Limiter( app, default_limits=config['rate_limit_default'], headers_enabled=True, key_func=get_remote_address, )
with
Limiter( get_remote_address, app=app, default_limits=config['rate_limit_default'], headers_enabled=True )
Save /usr/lib/python3.10/dist-packages/restol/restol_app.py
Download Ubuntu – Package Download Selection -- python3-limits_2.8.0-1_all.deb.
Install the downloaded file using the sudo apt-get install /path/to/python3-limits_2.8.0-1_all.deb command.
Note:Ensure that you change /path to /prefix along with the path where python3-limits deb package was copied.
Restart Control Center services by running the sudo ncc services restart command.
Problem: Errors related to BF-CBC cipher in the openvpn log
This issue occurs if the old test agents are unable to connect to Control Center Release 4.2. In this case, you need to add the support for BF-CBC cipher in the openvpn log.
To add the support for BF-CBC cipher:
Access /etc/openvpn/netrounds.conf.
Add BF-CBC cipher to /etc/openvpn/netrounds.conf
data-ciphers AES-256-GCM:AES-128-GCM:BF-CBC data-ciphers-fallback BF-CBC
Contacting Juniper Technical Support
To contact Juniper technical support, file a ticket at support.juniper.net/support/requesting-support. In your ticket, please upload the file generated from running the command
ncc generate-troubleshooting-report
The file will be located in your user's home directory
(/home/<username>
).