Resolve Merge Conflicts of the Configuration File
Resolve Common Backup and Restore Issues
View Installation Log Files
View Log Files in Grafana
Troubleshooting Using the kubectl Interface
Troubleshoot Using the paragon CLI Utility
Troubleshoot Ceph and Rook
Troubleshoot Air-Gap Installation Failure
Recover from a RabbitMQ Cluster Failure
Disable udevd Daemon During OSD Creation
Wrapper Scripts for Common Utility Commands
Back Up the Control Host
User Service Accounts for Debugging

Troubleshoot Paragon Automation Installation

SUMMARY Read the following topics to learn how to troubleshoot typical problems that you might encounter during and after installation.

Resolve Merge Conflicts of the Configuration File

The init script creates the template configuration files. If you update an existing installation using the same config-dir directory that was used for the installation, the template files that the init script creates are merged with the existing configuration files. Sometimes, this merging action creates a merge conflict that you must resolve. The script prompts you about how to resolve the conflict. When prompted, select one of the following options:

C—You can retain the existing configuration file and discard the new template file. This is the default option.
n—You can discard the existing configuration file and reinitialize the template file.
m—You can merge the files manually. Conflicting sections are marked with lines starting with <<<<<<<<, ||||||||, ========, and >>>>>>>>. You must edit the file and remove the merge markers before you proceed with the update.
d—You can view the differences between the files before you decide how to resolve the conflict.

Resolve Common Backup and Restore Issues

Suppose you destroy an existing cluster and redeploy a software image on the same cluster nodes. In such a scenario, if you try to restore a configuration from a previously backed-up configuration folder, the restore operation might fail. The restore operation fails because the mount path for the backed-up configuration is now changed. When you destroy an existing cluster, the persistent volume is deleted. When you redeploy the new image, the persistent volume gets re-created in one of the cluster nodes wherever space is available, but not necessarily in the same node as it was present in previously. As a result, the restore operation fails.

To work around these backup and restore issues:

Determine the mount path of the new persistent volume.
Copy the contents of the previous persistent volume's mount path to the new path.
Retry the restore operation.

View Installation Log Files

If the deploy script fails, you must check the installation log files in the config-dir directory. By default, the config-dir directory stores six zipped log files. The current log file is saved as log, and the previous log files are saved as log.1 through log.5 files. Every time you run the deploy script, the current log is saved, and the oldest one is discarded.

You typically find error messages at the end of a log file. View the error message, and fix the configuration.

View Log Files in Grafana

Grafana is an open-source data visualization tool. You use the Grafana UI to create and to view charts, graphs, and other visuals to help organize and understand data. You can create dashboards to monitor the status of devices, and you can also query data and view the results from the UI. Grafana UI renders data from Paragon Automation time series database (TSDB). For more information, see Grafana Documentation.

To view logs in the Grafana application:

Use one of the following methods to access Grafana:
- Use the virtual IP (VIP) address of the ingress controller: Open a browser and enter https://vip-of-ingress-controller-or-hostname-of-main-web-application/cluster-logs in the URL field.
- Use the Logs page: In the Paragon Automation UI, click Monitoring > Logs in the left-nav bar.
Enter the grafana_admin_user username and the grafana_admin_password password that you configured in the config.yml file during installation. The default username is admin.

If you do not configure the grafana_admin_password password, the installer generates a random password. You can retrieve the password using the following command:

# kubectl get secret -n kube-system grafana -o jsonpath={..grafana-password} | base64 -d
Click Home at the top left corner of the page.
Click Paragon Logs to view the logs. If it's not already visible, search for and click Paragon Logs.
(Optional) For instructions on how to create queries, see Query and Transform Data.

Troubleshooting Using the kubectl Interface

kubectl (Kube Control) is a command-line utility that interacts with the Kubernetes API, and the most common command line took to control Kubernetes clusters.

You can issue kubectl commands on the primary node right after installation. To issue kubectl commands on the worker nodes, you need to copy the admin.conf file and set the kubeconfig environment variable or use the export KUBECONFIG=config-dir /admin.conf command. The admin.conf file is copied to the config-dir directory on the control host as part of the installation process.

You use the kubectl command-line tool to communicate with the Kubernetes API and obtain information about API resources such as nodes, pods, and services, show log files, as well as create, delete, or modify those resources.

The syntax of kubectl commands is as follows:

kubectl [command] [TYPE] [NAME] [flags]

[command] is simply the action that you want to execute.

You can use the following command to view a list of kubectl commands:

root@primary-node:/# kubectl [enter]

You can ask for help, to get details and list all the flags and options associated with a particular command. For example:

root@primary-node:/# kubectl get -h

To verify and troubleshoot the operations in Paragon Automation, you'll use the following commands:

[command]	Description
get	Display one or many resources. The output shows a table of the most important information about the specified resources.
describe	Show details of a specific resource or a group of resources.
explain	Documentation of resources.
logs	Display the logs for a container in a pod.
rollout restart	Manage the rollout of a resource.
edit	Edit a resource.

[TYPE] represents the type of resource that you want to view. Resource types are case-insensitive, and you can use singular, plural, or abbreviated forms.

For example, pod, node, service, or deployment. For a complete list of resources, and allowed abbreviations (example, pod = po), issue this command:

kubectl api-resources

To learn more about a resource, issue this command:

kubectl explain [TYPE]

For example:

[NAME] is the name of a specific resource—for example, the name of a service or pod. Names are case-sensitive.

root@primary-node:/# kubectl get pod pod_name

[flags] provide additional options for a command. For example, -o lists more attributes for a resource. Use help (-h) to get information about the available flags.

Note that most Kubernetes resources (such as pods and services) are in some namespaces, while others are not (such as nodes).

Namespaces provide a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace, but not across namespaces.

When you use a command on a resource that is in a namespace, you must include the namespace as part of the command. Namespaces are case-sensitive. Without the proper namespace, the specific resource you are interested in might not be displayed.

You can get a list of all namespaces by issuing the kubectl get namespace command.

If you want to display resources for all namespaces, or you are not sure what namespaces the specific resource you are interested in belongs to, you can enter --all-namespaces or - A.

For more information about Kubernetes, see:

Use the following topics to troubleshoot and view installation details using the kubectl interface.

View Node Status
View Pod Status
View Detailed Information About a Pod
View the Logs for a Container in a Pod
Run a Command on a Container in a Pod
View Services
Frequently Used kubectl Commands

View Node Status

Use the kubectl get nodes command, abbreviated as the kubectl get no command, to view the status of the cluster nodes. The status of the nodes must be Ready, and the roles must be either control-plane or none. For example:

If a node is not Ready, verify whether the kubelet process is running. You can also use the system log of the node to investigate the issue.

To verify kubelet: root@primary-node:/# kubelet

View Pod Status

Use the kubectl get po –n namespace or kubectl get po -A command to view the status of a pod. You can specify an individual namespace (such as healthbot, northstar, and common) or you can use the -A parameter to view the status of all namespaces. For example:

The status of healthy pods must be Running or Completed, and the number of ready containers should match the total. If the status of a pod is not Running or if the number of containers does not match, use the kubectl describe po or kubectl log (POD | TYPE/NAME) [-c CONTAINER] command to troubleshoot the issue further.

View Detailed Information About a Pod

Use the kubectl describe po -n namespace pod-name command to view detailed information about a specific pod. For example:

View the Logs for a Container in a Pod

Use the kubectl logs -n namespace pod-name [-c container-name] command to view the logs for a particular pod. If a pod has multiple containers, you must specify the container for which you want to view the logs. For example:

Run a Command on a Container in a Pod

Use the kubectl exec –ti –n namespacepod-name [-c container-name] -- command-line command to run commands on a container inside a pod. For example:

After you run exec the command, you get a bash shell into the Postgres database server. You can access the bash shell inside the container, and run commands to connect to the database. Not all containers provide a bash shell. Some containers provide only SSH, and some containers do not have any shells.

View Services

Use the kubectl get svc -n namespace or kubectl get svc -A command to view the cluster services. You can specify an individual namespace (such as healthbot, northstar, and common), or you can use the -A parameter to view the services for all namespaces. For example:

In this example, the services are sorted by type, and only services of type LoadBalancer are displayed. You can view the services that are provided by the cluster and the external IP addresses that are selected by the load balancer to access those services.

You can access these services from outside the cluster. The external IP address is exposed and accessible from devices outside the cluster.

Frequently Used kubectl Commands

List the replication controllers:

Restart a component:

Edit a Kubernetes resource: You can edit a deployment or any Kubernetes API object, and these changes are saved to the cluster. However, if you reinstall the cluster, these changes are not preserved.

Troubleshoot Using the paragon CLI Utility

We've introduced the paragon command CLI utility to run commands on pods running in the system. The paragon commands are a set of intuitive commands to enable you to analyze, query, and troubleshoot your cluster. To execute the commands, log in to any of the primary nodes. The output of some of the commands is color-coded because, for some commands, the paragon command utility executes the kubecolor commands instead of kubectl, kubecolor color codes your kubectl command output. See Figure 1 for an example output.

To view the entire set of commands help options available, use one of the following commands:

You can view help options at any command level (not only at top level). For example:

You can use the tab option to view possible auto-completion options for the commands. To see top-level command auto-completion, type paragon and press tab. For example:

To view the underlying command that a paragon command runs, use the echo or -e option. For example:

To execute a paragon command as well as view the underlying command that it runs, use the debug or -d option. For example:

To view the entire list of paragon commands and the corresponding underlying commands that they run, use:

Figure 1: Example paragon command output Example paragon command output

Follow the instructions with regards to specific usage criteria such as arguments or prerequisites, if any, in the help section of each command. Some commands need mandatory arguments. For instance, the paragon insights logs devicegroup analytical command needs the argument --dg devicegroup-name-with subgroup. For example:

paragon insights logs devicegroup analytical --dg controller-0

Some commands have prerequisites. For instance, prior to using the paragon insights get playbooks command, you must set the username and password by using the paragon set username --cred username and paragon set password --cred password commands.

The complete set of commands along with their usage criteria is listed in Table 1.

Table 1: paragon CLI Utility
Command	Description
`paragon ambassador get emissary`	Shows Paragon ambassador emissary pods.
`paragon ambassador get pods`	Shows all Paragon ambassador pods.
`paragon ambassador get services`	Shows all Paragon ambassador services.
`paragon common postgres roles`	Helps to find the Postgres roles.
`paragon describe node`	Shows the description of a particular node in the cluster. Use the `--node node-ip` argument. Example: `paragon describe node --node 172.16.x.221` You can use the `paragon get nodes all` command to get the node IP address.
`paragon ems get devicemanager`	Shows the device manager Paragon ems pods.
`paragon ems get jobmanager`	Shows the job manager Paragon EMS pods.
`paragon ems get pods`	Shows all Paragon EMS pods.
`paragon ems get services`	Shows all Paragon EMS services.
`paragon ems logs devicemanager`	Shows the logs of Paragon EMS device manager pods. Use the `--type follow` argument to get live streaming logs.
`paragon ems logs jobmanager`	Shows the logs of paragon ems job manager pod. Use the `--type follow` argument to get live streaming logs.
`paragon get namespaces`	Shows all namespaces available in Paragon.
`paragon get nodes all`	Shows a list of all nodes in the cluster.
`paragon get nodes diskpressure`	Validates if kubelet has any disk pressure. Use the `--node node_ip/node_name` argument. Example: `paragon get nodes diskpressure --node 172.16.x.221`
`paragon get nodes memorypressure`	Validates if kubelet has sufficient memory. Use the `--node node_ip/node_name` argument. Example: `paragon get nodes memorypressure --node 172.16.x.221`
`paragon get nodes networkunavailable`	Checks for issues with calico and the network. Use the `--node node_ip/node_name` argument. Example: `paragon get nodes networkunavailable --node davinci-primary`
`paragon get nodes notready`	Shows list of all nodes that is not ready in the cluster.
`paragon get nodes pidpressure`	Validates if kubelet has sufficient PID available. Use the `--node node_ip/node_name` argument. Example: `paragon get nodes pidpressure --node davinci-worker1`
`paragon get nodes ready`	Shows list of all nodes that is ready in the cluster.
`paragon get nodes taint`	Shows list of all taint on the nodes.
`paragon get pods healthy`	Shows all the healthy Paragon pods.
`paragon get pods unhealthy`	Shows all the unhealthy Paragon pods.
`paragon get services exposed`	Shows all the Paragon services that are exposed.
`paragon insights cli alerta`	Logs in to the CLI of the Paragon Insights alerta pod.
`paragon insights cli byoi`	Logs in to the CLI of the BYOI plug-in. Use the `--byoi BYOI plugin name` argument.
`paragon insights cli configserver`	Logs in to the CLI of Paragon Insights config-server pod.
`paragon insights cli grafana`	Logs in to the CLI of Paragon Insights grafana pod.
`paragon insights cli influxdb`	Logs in to the CLI of Paragon Insights influxdb pod. Use the `--influx influxdb-nodeip` argument to specify the node IP If not, the command will use the first influxdb node as the default node. Example: `paragon insights cli influxdb --influx influxdb-172.16.x.21`
`paragon insights cli mgd`	Logs in to the CLI of Paragon Insights mgd pod.
`paragon insights describe alerta`	Describes the Paragon Insights alerta pod.
`paragon insights describe api`	Describes the Paragon Insights REST API pod.
`paragon insights describe configserver`	Describes the Paragon Insights config-server pod.
`paragon insights describe grafana`	Describes the Paragon Insights grafana pod.
`paragon insights describe influxdb`	Describes the Paragon Insights influxdb pod. Use the `--influx influxdb-nodeip` argument to specify the node IP. If not, the command will use the first influxdb node as the default node. Example: `paragon insights describe influxdb --influx influxdb-172.16.x.21`
`paragon insights describe mgd`	Describes the Paragon Insights mgd pod.
`paragon insights get alerta`	Shows the Paragon Insights alerta pod.
`paragon insights get api`	Shows the Paragon Insights REST API pod.
`paragon insights get configserver`	Shows the Paragon Insights config-server pod.
`paragon insights get devicegroups`	Shows all the Paragon Insights device groups. The default username is `admin`. To modify the username, run the `paragon set user --cred username`> command. As a prerequisite, run the `paragon set password --cred password` command to set the Paragon (UI host) password.
`paragon insights get devices`	Shows all Paragon Insights devices. The default username is `admin`. To modify the username, run the `paragon set user --cred username` command. As a prerequisite, run the `paragon set password --cred password` command to set the Paragon (UI host) password.
`paragon insights get grafana`	Shows the Paragon Insights grafana pod.
`paragon insights get influxdb`	Shows the Paragon Insights influxdb pod.
`paragon insights get ingest`	Shows the Paragon Insights network telemetry ingestion pods.
`paragon insights get mgd`	Shows the Paragon Insights mgd pod.
`paragon insights get playbooks`	Shows all Paragon Insights playbooks. The default username is `admin`. To modify the username, run the `paragon set user --cred username` command. As a prerequisite, run the `paragon set password --cred password` command to set the Paragon (UI host) password.
`paragon insights get pods`	Shows all the Paragon Insights pods.
`paragon insights get services`	Shows all the Paragon Insights services.
`paragon insights logs alerta`	Shows the logs of the Paragon Insights alerta pod.
`paragon insights logs api`	Shows the logs of the Paragon Insights rest api pod.
`paragon insights logs byoi`	Shows the logs of the Paragon Insights BYOI plug-in. Use the `--byoi BYOI plugin name` argument.
`paragon insights logs configserver`	Shows the logs of the Paragon Insights config-server pod.
`paragon insights logs devicegroup analytical`	Shows the logs of the Paragon Insights device group for service analytical engine. Use the `--dg device Group name with subgroup` argument. Example: `paragon insights logs devicegroup analytical --dg controller-0` In the example, `controller` is the devicegroup name and `0` is the subgroup.
`paragon insights logs devicegroup itsdb`	Shows the logs of the Paragon Insights device group for service itsdb. Use the`--dg device Group name with subgroup` argument. Example: `paragon insights logs devicegroup itsdb --dg controller-0` In the example, `controller` is the devicegroup name and `0` is the subgroup.
`paragon insights logs devicegroup jtimon`	Shows the logs of the Paragon Insights device group for service jtimon. Use the`--dg device Group name with subgroup` argument. Example: `paragon insights logs devicegroup jtimon --dg controller-0` In the example, `controller` is the devicegroup name and `0` is the subgroup.
`paragon insights logs devicegroup native`	Shows the logs of the Paragon Insights device group for service jti native. Use the`--dg device Group name with subgroup` argument. Example: `paragon insights logs devicegroup native --dg controller-0` In the example, `controller` is the devicegroup name and `0` is the subgroup.
`paragon insights logs devicegroup syslog`	Shows the logs of the Paragon Insights device group for service syslog. Use the`--dg device Group name with subgroup` argument. Example: `paragon insights logs devicegroup syslog --dg controller-0` In the example, `controller` is the devicegroup name and `0` is the subgroup.
`paragon insights logs grafana`	Shows the logs of the Paragon Insights Grafana pod.
`paragon insights logs influxdb`	Shows the logs of the Paragon Insights influxdb pod. Use the `--influx influxdb-nodeip` argument to specify the node IP. If not, the command will use the first influxdb node as the default node. Example: `paragon insights logs influxdb --influx influxdb-172.16.x.21`
`paragon insights logs mgd`	Shows the logs of the Paragon Insights mgd pod.
`paragon pathfinder cli bmp`	Logs in to the CLI of the Paragon Pathfinder BMP container.
`paragon pathfinder cli configserver`	Logs in to the CLI of the Paragon Pathfinder ns-configserver container.
`paragon pathfinder cli crpd`	Logs in to the CLI of the Paragon Pathfinder cRPD container.
`paragon pathfinder cli debugutils`	Logs in to the CLI of the Paragon Pathfinder debugutils container.
`paragon pathfinder cli netconf`	Logs in to the CLI of the Paragon Pathfinder netconf container.
`paragon pathfinder cli pceserver`	Logs in to the CLI of the Paragon Pathfinder ns-pceserver container (PCEP) services.
`paragon pathfinder cli pcserver`	Logs in to the CLI of the Paragon Pathfinder ns-pcserver (PCS) container.
`paragon pathfinder cli pcviewer`	Logs in to the CLI of the Paragon Pathfinder ns-pcsviewer (Paragon Planner Desktop Application) container.
`paragon pathfinder cli scheduler`	Gets into the CLI of paragon pathfinder scheduler container.
`paragon pathfinder cli toposerver`	Logs into the CLI of the Paragon Pathfinder ns-toposerver (Topology service) container.
`paragon pathfinder cli web`	Logs into the CLI of the Paragon Pathfinder ns-web container.
`paragon pathfinder debug bgpls config`	Debugs the Paragon Pathfinder cRPD routing-options configuration related to BGP-LS.
`paragon pathfinder debug bgpls routes`	Debugs the Paragon Pathfinder cRPD routes related to BGP-LS.
`paragon pathfinder debug genjvisiondata help`	Shows Paragon Pathfinder debugutils genjvisiondata help.
`paragon pathfinder debug genjvisiondata params`	Shows Paragon Pathfinder debugutils genjvisiondata params.
`paragon pathfinder debug lsp`	Logs in to the Paragon Pathfinder PCEP CLI for debugging.
`paragon pathfinder debug postgres status`	Shows the Kubernetes cluster Postgres status.
`paragon pathfinder debug rabbitmq status`	Shows the rabbitmqctl cluster status.
`paragon pathfinder debug snoop amqp`	Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between AMQP.
`paragon pathfinder debug snoop help`	Shows Paragon Pathfinder debugutils snoop help.
`paragon pathfinder debug snoop postgres`	Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Postgres.
`paragon pathfinder debug snoop redis link`	Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Redis link.
`paragon pathfinder debug snoop redis lsp`	Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Redis lsp.
`paragon pathfinder debug snoop redis node`	Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between redis nodes.
`paragon pathfinder debug topoutil help`	Shows Paragon Pathfinder debugutils topo_util help.
`paragon pathfinder debug topoutil safemode deactivate`	Shows Paragon Pathfinder debugutils topo_util tool to deactivate safe mode.
`paragon pathfinder debug topoutil topo refresh`	Runs Paragon Pathfinder debugutils topo_util tool to refresh the current topology.
`paragon pathfinder debug topoutil topo save`	Runs Paragon Pathfinder debugutils topo_util tool to save the current topology snapshot.
`paragon pathfinder describe bmp`	Describes Paragon Pathfinder pod including cRPD and BMP containers.
`paragon pathfinder describe configserver`	Describes Paragon Pathfinder pod including config-server container.
`paragon pathfinder describe debugutils`	Describes Paragon Pathfinder pod including debugutils container.
`paragon pathfinder describe netconf`	Describes Paragon Pathfinder pod including ns-netconfd container.
`paragon pathfinder describe pceserver`	Describes Paragon Pathfinder pod including ns-pceserver container (PCEP services).
`paragon pathfinder describe pcserver`	Describes Paragon Pathfinder pod including ns-pcserver container (PCS).
`paragon pathfinder describe pcviewer`	Describes paragon pathfinder pod including ns-pcsviewer container (Paragon Planner Desktop Application).
`paragon pathfinder describe scheduler`	Describes Paragon Pathfinder pod including scheduler container.
`paragon pathfinder describe toposerver`	Describes Paragon Pathfinder pod including ns-toposerver (Topology service) container.
`paragon pathfinder describe web`	Describes Paragon Pathfinder pod including web container.
`paragon pathfinder get bmp`	Shows Paragon Pathfinder pod including cRPD and BMP containers.
`paragon pathfinder get configserver`	Shows Paragon Pathfinder pod including ns-configserver and syslog containers.
`paragon pathfinder get debugutils`	Shows Paragon Pathfinder pod including debugutils container.
`paragon pathfinder get netconf`	Shows Paragon Pathfinder pod associated with the netconf process.
`paragon pathfinder get pceserver`	Shows Paragon Pathfinder pod including ns-pceserver container (PCEP services).
`paragon pathfinder get pcserver`	Shows Paragon Pathfinder pod including ns-pcserver container (PCS).
`paragon pathfinder get pcviewer`	Shows Paragon Pathfinder pod including ns-pcsviewer container (Paragon Planner Desktop Application).
`paragon pathfinder get pods`	Shows all Paragon Pathfinder pods.
`paragon pathfinder get scheduler`	Shows Paragon Pathfinder pod associated with the scheduler process.
`paragon pathfinder get services`	Shows all Paragon Pathfinder services.
`paragon pathfinder get toposerver`	Shows Paragon Pathfinder pod including ns-toposerver container (Topology service).
`paragon pathfinder get web`	Shows Paragon Pathfinder pod associated with the ns-web process.
`paragon pathfinder logs bmp container bmp`	Shows the logs of Paragon Pathfinder bmp pods bmp container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs bmp container crpd`	Shows the logs of Paragon Pathfinder bmp pods cRPD container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs bmp container syslog`	Shows the logs of Paragon Pathfinder bmp pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs configserver container nsconfigserver`	Shows the logs of Paragon Pathfinder configserver pods ns-configserver container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs configserver container syslog`	Shows the logs of Paragon Pathfinder configserver pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs netconf container nsnetconfd`	Shows the logs of Paragon Pathfinder netconf pods ns-netconfd container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs netconf container syslog`	Shows the logs of Paragon Pathfinder netconf pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pceserver container nspceserver`	Shows the logs of Paragon Pathfinder pceserver pods ns-pceserver container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pceserver container syslog`	Shows the logs of Paragon Pathfinder pceserver pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pceserver syslog filtered`	Shows processed logs of Paragon Pathfinder pceserver pods syslog container fetching only timestamp, level, and message. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pcserver container nspcserver`	Shows the logs of Paragon Pathfinder pcserver pods ns-pcserver container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pcserver container syslog`	Shows the logs of Paragon Pathfinder pcserver pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pcserver syslog filtered`	Shows processed logs of Paragon Pathfinder pceserver pods syslog container fetching only with timestamp, level, and message. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pcviewer container nspcviewer`	Shows the logs of Paragon Pathfinder pcviewer pods ns-pcviewer container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs pcviewer container syslog`	Shows the logs of Paragon Pathfinder pcviewer pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs toposerver container nstopodbinit`	Shows the logs of Paragon Pathfinder toposerver pods ns-topo-dbinit container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs toposerver container nstopodbinitcache`	Shows the logs of Paragon Pathfinder toposerver pods ns-topo-dbinit-cache container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs toposerver container nstoposerver`	Shows the logs of Paragon Pathfinder toposerver pods ns-toposerver container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs toposerver container syslog`	Shows the logs of Paragon Pathfinder toposerver pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs toposerver syslog filtered`	Shows processed logs of Paragon Pathfinder toposerver pods syslog container fetching only with timestamp, level, and message. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs web container nsweb`	Shows the logs of Paragon Pathfinder web pods ns-web container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs web container nswebdbinit`	Shows the logs of Paragon Pathfinder web pods ns-web-dbinit container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder logs web container syslog`	Shows the logs of Paragon Pathfinder web pods syslog container. Use the `--type follow` argument to get live streaming logs.
`paragon pathfinder rabbitmq geoha status`	Shows the federation status (from rabbitmq-0 instance). GeoHa status is only available for a dual cluster setup.
`paragon rookceph ceph osddf`	Reports Rook and Ceph OSD file system disk space usage.
`paragon rookceph ceph osdpoolstats`	Shows Rook and Ceph OSD pool statistics.
`paragon rookceph ceph osdstatus`	Shows Rook and Ceph OSD status.
`paragon rookceph ceph osdtree`	Shows Rook and Ceph OSD tree.
`paragon rookceph ceph osdutilization`	Shows Rook and Ceph OSD utilization.
`paragon rookceph ceph pgstat`	Shows Rook and Ceph pg status.
`paragon rookceph ceph status`	Shows Rook and Ceph status.
`paragon rookceph cli toolbox`	Logs in to the CLI of Rook and Ceph toolbox pod.
`paragon rookceph get pods`	Shows Rook and Ceph pods.
`paragon rookceph get services`	Shows Rook and Ceph services.
`paragon rookceph radosgw get period`	This is RADOS gateway user administration utility which gets the period info.
`paragon rookceph radosgw synch status`	This is RADOS gateway user administration utility which gets the metadata sync status.
`paragon set password`	Sets the Paragon (UI host) password for REST calls authentication. Use this mandatory one-time set password command to set the password using the `--cred password` argument. Example: `paragon set password --cred AdminXYX!`
`paragon set username`	Sets the Paragon (UI host) username for Rest calls authentication. The default username is `admin`. Use the `--cred username` argument to set a different username. Example: `paragon set username --cred newadmin`

Troubleshoot Ceph and Rook

Ceph requires relatively newer Kernel versions. If your Linux kernel is very old, consider upgrading or reinstalling a new one.

Use this section to troubleshoot issues with Ceph and Rook.

Insufficient Disk Space

A common reason for installation failure is that the object storage daemons (OSDs) are not created. An OSD configures the storage on a cluster node. OSDs might not be created because of non-availability of disk resources, in the form of either insufficient resources or incorrectly partitioned disk space. Ensure that the nodes have sufficient unpartitioned disk space available.

Reformat a Disk

Examine the logs of the "rook-ceph-osd-prepare-hostname-*" jobs. The logs are descriptive. If you need to reformat the disk or partition, and restart Rook, perform the following steps:

Use one of the following methods to reformat an existing disk or partition.
- If you have a block storage device that should have been used for Ceph, but wasn't used because it was in an unusable state, you can reformat the disk completely.
- If you have a disk partition that should have been used for Ceph, you can clear the data on the partition completely.
Note:
These commands completely reformat the disk or partitions that you are using and you will lose all data on them.
Restart Rook to save the changes and reattempt the OSD creation process.

View Pod Status

To check the status of Rook and Ceph pods installed in the rook-ceph namespace, use the # kubectl get po -n rook-ceph command. The following pods must be in the running state.

rook-ceph-mon-*—Typically, three monitor pods are created.
rook-ceph-mgr-*—One manager pod
rook-ceph-osd-*—Three or more OSD pods
rook-ceph-mds-cephfs-*—Metadata servers
rook-ceph-rgw-object-store-*—ObjectStore gateway
rook-ceph-tools*—For additional debugging options.
To connect to the toolbox, use the command:

$ kubectl exec -ti -n rook-ceph $(kubectl get po -n rook-ceph -l app=rook-ceph-tools \ -o jsonpath={..metadata.name}) -- bash

Some of the common commands you can use in the toolbox are:
# ceph status # ceph osd status, # ceph osd df, # ceph osd utilization, # ceph osd pool stats, # ceph osd tree, and # ceph pg stat

Troubleshoot Ceph OSD failure

Check the status of pods installed in the rook-ceph namespace.

# kubectl get po -n rook-ceph

If a rook-ceph-osd-* pod is in the Error or CrashLoopBackoff state, then you must repair the disk.

Stop the rook-ceph-operator.

# kubectl scale deploy -n rook-ceph rook-ceph-operator --replicas=0
Remove the failing OSD processes.

# kubectl delete deploy -n rook-ceph rook-ceph-osd-number
Connect to the toolbox.

$ kubectl exec -ti -n rook-ceph $(kubectl get po -n rook-ceph -l app=rook-ceph-tools -o jsonpath={..metadata.name}) -- bash
Identify the failing OSD.

# ceph osd status

Mark out the failed OSD.

Remove the failed OSD.

# ceph osd purge number --yes-i-really-mean-it
Connect to the node that hosted the failed OSD and do one of the following:
- Replace the hard disk in case of a hardware failure.
- Reformat the disk completely.
- Reformat the partition completely.
Restart rook-ceph-operator.

# kubectl scale deploy -n rook-ceph rook-ceph-operator --replicas=1
Monitor the OSD pods.

# kubectl get po -n rook-ceph

If the OSD does not recover, use the same procedure to remove the OSD, and then remove the disk or delete the partition before restarting rook-ceph-operator.

Troubleshoot Air-Gap Installation Failure

The air-gap installation as well as the kube-apiserver fails with the following error because you do not have an existing /etc/resolv.conf file.

To create a new file, you must run the #touch /etc/resolv.conf command as the root user, and then redeploy the Paragon Automation cluster.

Recover from a RabbitMQ Cluster Failure

If your Paragon Automation cluster fails (for example, from a power outage), the RabbitMQ message bus may not restart properly.

To check for this condition, run the kubectl get po -n northstar -l app=rabbitmq command. This command should show three pods with their status as Running. For example:

However, if the status of one or more pods is Error, use the following recovery procedure:

Delete RabbitMQ.

kubectl delete po -n northstar -l app=rabbitmq
Check the status of the pods.

kubectl get po -n northstar -l app=rabbitmq.

Repeat kubectl delete po -n northstar -l app=rabbitmq until the status of all pods is Running.
Restart the Paragon Pathfinder applications.

Disable udevd Daemon During OSD Creation

You use the udevd daemon for managing new hardware such as disks, network cards, and CDs. During the creation of OSDs, the udevd daemon detects the OSDs and can lock them before they are fully initialized. The Paragon Automation installer disables systemd-udevd during installation and enables it after Rook has initialized the OSDs.

When adding or replacing nodes and repairing failed nodes, you must manually disable the udevd daemon so that OSD creation does not fail. You can reenable the daemon after the OSDs are created.

Use these commands to manually disable and enable udevd.

Log in to the node that you want to add or repair.
Disable the udevd daemon.
1. Check whether udevd is running.
  # systemctl is-active systemd-udevd
2. If udevd is active, disable it. # systemctl mask system-udevd --now
When you repair or replace a node, the Ceph distributed filesystems are not automatically updated. If the data disks are destroyed as part of the repair process, then you must recover the object storage daemons (OSDs) hosted on those data disks.
1. Connect to the Ceph toolbox and view the status of OSDs. The ceph-tools script is installed on a primary node. You can log in to the primary node and use the kubectl interface to access ceph-tools. To use a node other than the primary node, you must copy the admin.conf file (in the config-dir directory on the control host) and set the kubeconfig environment variable or use the export KUBECONFIG=config-dir/admin.conf command.
  
  $ ceph-tools# ceph osd status
2. Verify that all OSDs are listed as exists,up. If OSDs are damaged, follow the troubleshooting instructions explained in Troubleshoot Ceph and Rook.
Log in to node that you added or repaired after verifying that all OSDs are created.
Reenable udevd on the node.

systemctl unmask system-udevd

Alternatively, you can set

disable_udevd:
          true

in the config.yml and run the

./run -c
            config-dir deploy

command. We do not recommend that you redeploy the cluster only to disable the udevd daemon.

Wrapper Scripts for Common Utility Commands

You can use the following wrapper scripts installed in /usr/local/bin to connect to and run commands on pods running in the system.

Command	Description
`paragon-db [arguments]`	Connect to the database server and start the Postgres SQL shell using the superuser account. Optional arguments are passed to the Postgres SQL command.
`pf-cmgd [arguments]`	Start the CLI in the Paragon Pathfinder CMGD pod. Optional arguments are executed by the CLI.
`pf-crpd [arguments]`	Start the CLI in the Paragon Pathfinder cRPD pod. Optional arguments are executed by the CLI.
`pf-redis [arguments]`	Start the (authenticated) redis-cli in the Paragon Pathfinder Redis pod. Optional arguments are executed by the Redis pod.
`pf-debugutils [arguments]`	Start the shell in the Paragon Pathfinder debugutils pod. Optional arguments are executed by the shell. Pathfinder debugutils utilities are installed if `install_northstar_debugutils: true` is configured in the config.yml file.
`ceph-tools [arguments]`	Start the shell to the Ceph toolbox. Optional arguments are executed by the shell.

Back Up the Control Host

If your control host fails, you must back up the config-dir directory to a remote location to be able to rebuild your cluster . The config-dir contains the inventory, config.yml, and id_rsa files.

Alternatively, you can also rebuild the inventory and config.yml files by downloading information from the cluster using the following commands:

# kubectl get cm -n common metadata -o jsonpath={..inventory} > inventory

# kubectl get cm -n common metadata -o jsonpath={..config_yml} > config.yml

You cannot recover SSH keys; you must replace failed keys with new keys.

User Service Accounts for Debugging

Paragon Pathfinder, telemetry manager, and base platform applications internally use Paragon Insights for telemetry collection. To debug configuration issues associated with these applications, three user service accounts are created, by default, during Paragon Automation installation. The scope of these service accounts is limited to debugging the corresponding application only. The service accounts details are listed in the following table.

Table 2: Service Account Details
Application Name and Scope	Account Username	Account Default Password
Paragon Pathfinder (northstar)	hb-northstar-admin	Admin123!
Telemetry manager (tm)	hb-tm-admin
Base platform (ems-dmon)	hb-ems-dmon

You must use these accounts solely for debugging purposes. Do not use these accounts for day-to-day operations or for modifying any configuration. We recommend that you change the login credentials for security reasons.

ON THIS PAGE

Troubleshoot Paragon Automation Installation

Resolve Merge Conflicts of the Configuration File

Resolve Common Backup and Restore Issues

View Installation Log Files

View Log Files in Grafana

Troubleshooting Using the kubectl Interface

View Node Status

View Pod Status

View Detailed Information About a Pod

View the Logs for a Container in a Pod

Run a Command on a Container in a Pod

View Services

Frequently Used kubectl Commands

Troubleshoot Using the paragon CLI Utility

Troubleshoot Ceph and Rook

Insufficient Disk Space

Reformat a Disk

View Pod Status

Troubleshoot Ceph OSD failure

Troubleshoot Air-Gap Installation Failure

Recover from a RabbitMQ Cluster Failure

Disable udevd Daemon During OSD Creation

Wrapper Scripts for Common Utility Commands

Back Up the Control Host

User Service Accounts for Debugging

Related Documentation