Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Troubleshoot Paragon Automation Installation

SUMMARY Read the following topics to learn how to troubleshoot typical problems that you might encounter during and after installation.

Resolve Merge Conflicts of the Configuration File

The init script creates the template configuration files. If you update an existing installation using the same config-dir directory that was used for the installation, the template files that the init script creates are merged with the existing configuration files. Sometimes, this merging action creates a merge conflict that you must resolve. The script prompts you about how to resolve the conflict. When prompted, select one of the following options:

  • C—You can retain the existing configuration file and discard the new template file. This is the default option.

  • n—You can discard the existing configuration file and reinitialize the template file.

  • m—You can merge the files manually. Conflicting sections are marked with lines starting with <<<<<<<<, ||||||||, ========, and >>>>>>>>. You must edit the file and remove the merge markers before you proceed with the update.

  • d—You can view the differences between the files before you decide how to resolve the conflict.

Resolve Common Backup and Restore Issues

Suppose you destroy an existing cluster and redeploy a software image on the same cluster nodes. In such a scenario, if you try to restore a configuration from a previously backed-up configuration folder, the restore operation might fail. The restore operation fails because the mount path for the backed-up configuration is now changed. When you destroy an existing cluster, the persistent volume is deleted. When you redeploy the new image, the persistent volume gets re-created in one of the cluster nodes wherever space is available, but not necessarily in the same node as it was present in previously. As a result, the restore operation fails.

To work around these backup and restore issues:

  1. Determine the mount path of the new persistent volume.

  2. Copy the contents of the previous persistent volume's mount path to the new path.

  3. Retry the restore operation.

View Installation Log Files

If the deploy script fails, you must check the installation log files in the config-dir directory. By default, the config-dir directory stores six zipped log files. The current log file is saved as log, and the previous log files are saved as log.1 through log.5 files. Every time you run the deploy script, the current log is saved, and the oldest one is discarded.

You typically find error messages at the end of a log file. View the error message, and fix the configuration.

Troubleshooting Using the kubectl Interface

kubectl (Kube Control) is a command-line utility that interacts with the Kubernetes API, and the most common command line took to control Kubernetes clusters.

You can issue kubectl commands on the primary node right after installation. To issue kubectl commands on the worker nodes, you need to copy the admin.conf file and set the kubeconfig environment variable or use the export KUBECONFIG=config-dir /admin.conf command. The admin.conf file is copied to the config-dir directory on the control host as part of the installation process.

You use the kubectl command-line tool to communicate with the Kubernetes API and obtain information about API resources such as nodes, pods, and services, show log files, as well as create, delete, or modify those resources.

The syntax of kubectl commands is as follows:

kubectl [command] [TYPE] [NAME] [flags]

[command] is simply the action that you want to execute.

You can use the following command to view a list of kubectl commands:

root@primary-node:/# kubectl [enter]

You can ask for help, to get details and list all the flags and options associated with a particular command. For example:

root@primary-node:/# kubectl get -h

To verify and troubleshoot the operations in Paragon Automation, you'll use the following commands:

[command] Description
get

Display one or many resources.

The output shows a table of the most important information about the specified resources.

describe Show details of a specific resource or a group of resources.
explain Documentation of resources.
logs Display the logs for a container in a pod.
rollout restart Manage the rollout of a resource.
edit Edit a resource.

[TYPE] represents the type of resource that you want to view. Resource types are case-insensitive, and you can use singular, plural, or abbreviated forms.

For example, pod, node, service, or deployment. For a complete list of resources, and allowed abbreviations (example, pod = po), issue this command:

kubectl api-resources

To learn more about a resource, issue this command:

kubectl explain [TYPE]

For example:

[NAME] is the name of a specific resource—for example, the name of a service or pod. Names are case-sensitive.

root@primary-node:/# kubectl get pod pod_name

[flags] provide additional options for a command. For example, -o lists more attributes for a resource. Use help (-h) to get information about the available flags.

Note that most Kubernetes resources (such as pods and services) are in some namespaces, while others are not (such as nodes).

Namespaces provide a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace, but not across namespaces.

When you use a command on a resource that is in a namespace, you must include the namespace as part of the command. Namespaces are case-sensitive. Without the proper namespace, the specific resource you are interested in might not be displayed.

You can get a list of all namespaces by issuing the kubectl get namespace command.

If you want to display resources for all namespaces, or you are not sure what namespaces the specific resource you are interested in belongs to, you can enter --all-namespaces or - A.

For more information about Kubernetes, see:

Use the following topics to troubleshoot and view installation details using the kubectl interface.

View Node Status

Use the kubectl get nodes command, abbreviated as the kubectl get no command, to view the status of the cluster nodes. The status of the nodes must be Ready, and the roles must be either control-plane or none. For example:

If a node is not Ready, verify whether the kubelet process is running. You can also use the system log of the node to investigate the issue.

To verify kubelet: root@primary-node:/# kubelet

View Pod Status

Use the kubectl get po –n namespace or kubectl get po -A command to view the status of a pod. You can specify an individual namespace (such as healthbot, northstar, and common) or you can use the -A parameter to view the status of all namespaces. For example:

The status of healthy pods must be Running or Completed, and the number of ready containers should match the total. If the status of a pod is not Running or if the number of containers does not match, use the kubectl describe po or kubectl log (POD | TYPE/NAME) [-c CONTAINER] command to troubleshoot the issue further.

View Detailed Information About a Pod

Use the kubectl describe po -n namespace pod-name command to view detailed information about a specific pod. For example:

View the Logs for a Container in a Pod

Use the kubectl logs -n namespace pod-name [-c container-name] command to view the logs for a particular pod. If a pod has multiple containers, you must specify the container for which you want to view the logs. For example:

Run a Command on a Container in a Pod

Use the kubectl exec –ti –n namespacepod-name [-c container-name] -- command-line command to run commands on a container inside a pod. For example:

After you run exec the command, you get a bash shell into the Postgres database server. You can access the bash shell inside the container, and run commands to connect to the database. Not all containers provide a bash shell. Some containers provide only SSH, and some containers do not have any shells.

View Services

Use the kubectl get svc -n namespace or kubectl get svc -A command to view the cluster services. You can specify an individual namespace (such as healthbot, northstar, and common), or you can use the -A parameter to view the services for all namespaces. For example:

In this example, the services are sorted by type, and only services of type LoadBalancer are displayed. You can view the services that are provided by the cluster and the external IP addresses that are selected by the load balancer to access those services.

You can access these services from outside the cluster. The external IP address is exposed and accessible from devices outside the cluster.

Frequently Used kubectl Commands

  • List the replication controllers:

  • Restart a component:

  • Edit a Kubernetes resource: You can edit a deployment or any Kubernetes API object, and these changes are saved to the cluster. However, if you reinstall the cluster, these changes are not preserved.

Troubleshoot Using the paragon CLI Utility

We've introduced the paragon command CLI utility to run commands on pods running in the system. The paragon commands are a set of intuitive commands to enable you to analyze, query, and troubleshoot your cluster. To execute the commands, log in to any of the primary nodes. The output of some of the commands is color-coded because, for some commands, the paragon command utility executes the kubecolor commands instead of kubectl, kubecolor color codes your kubectl command output. See Figure 1 for an example output. You must either be a root user or a non-root user with superuser (sudo) provileges to run the paragon CLI utility commands.

To view the entire set of commands help options available, use one of the following commands:

You can view help options at any command level (not only at top level). For example:

You can use the tab option to view possible auto-completion options for the commands. To see top-level command auto-completion, type paragon and press tab. For example:

To view the underlying command that a paragon command runs, use the echo or -e option. For example:

To execute a paragon command as well as view the underlying command that it runs, use the debug or -d option. For example:

To view the entire list of paragon commands and the corresponding underlying commands that they run, use:

Figure 1: Example paragon command output Example paragon command output

Follow the instructions with regards to specific usage criteria such as arguments or prerequisites, if any, in the help section of each command. Some commands need mandatory arguments. For instance, the paragon insights logs devicegroup analytical command needs the argument --dg devicegroup-name-with subgroup. For example:

paragon insights logs devicegroup analytical --dg controller-0

Some commands have prerequisites. For instance, prior to using the paragon insights get playbooks command, you must set the username and password by using the paragon set username --cred username and paragon set password --cred password commands.

The complete set of commands along with their usage criteria is listed in Table 1.

Table 1: paragon CLI Utility

Command

Description

paragon ambassador get emissary

Shows Paragon ambassador emissary pods.

paragon ambassador get pods

Shows all Paragon ambassador pods.

paragon ambassador get services

Shows all Paragon ambassador services.

paragon common postgres roles

Helps to find the Postgres roles.

paragon describe node

Shows the description of a particular node in the cluster.

Use the --node node-ip argument.

Example: paragon describe node --node 172.16.x.221

You can use the paragon get nodes all command to get the node IP address.

paragon ems get devicemanager

Shows the device manager Paragon ems pods.

paragon ems get jobmanager

Shows the job manager Paragon EMS pods.

paragon ems get pods

Shows all Paragon EMS pods.

paragon ems get services

Shows all Paragon EMS services.

paragon ems logs devicemanager

Shows the logs of Paragon EMS device manager pods.

Use the --type follow argument to get live streaming logs.

paragon ems logs jobmanager

Shows the logs of paragon ems job manager pod. Use the --type follow argument to get live streaming logs.

paragon get namespaces

Shows all namespaces available in Paragon.

paragon get nodes all

Shows a list of all nodes in the cluster.

paragon get nodes diskpressure

Validates if kubelet has any disk pressure.

Use the --node node_ip/node_name argument.

Example: paragon get nodes diskpressure --node 172.16.x.221

paragon get nodes memorypressure

Validates if kubelet has sufficient memory.

Use the --node node_ip/node_name argument.

Example: paragon get nodes memorypressure --node 172.16.x.221

paragon get nodes networkunavailable

Checks for issues with calico and the network.

Use the --node node_ip/node_name argument.

Example: paragon get nodes networkunavailable --node davinci-primary

paragon get nodes notready

Shows list of all nodes that is not ready in the cluster.

paragon get nodes pidpressure

Validates if kubelet has sufficient PID available.

Use the --node node_ip/node_name argument.

Example: paragon get nodes pidpressure --node davinci-worker1

paragon get nodes ready

Shows list of all nodes that is ready in the cluster.

paragon get nodes taint

Shows list of all taint on the nodes.

paragon get pods healthy

Shows all the healthy Paragon pods.

paragon get pods unhealthy

Shows all the unhealthy Paragon pods.

paragon get services exposed

Shows all the Paragon services that are exposed.

paragon insights cli alerta

Logs in to the CLI of the Paragon Insights alerta pod.

paragon insights cli byoi

Logs in to the CLI of the BYOI plug-in.

Use the --byoi BYOI plugin name argument.

paragon insights cli configserver

Logs in to the CLI of Paragon Insights config-server pod.

paragon insights cli grafana

Logs in to the CLI of Paragon Insights grafana pod.

paragon insights cli influxdb

Logs in to the CLI of Paragon Insights influxdb pod.

Use the --influx influxdb-nodeip argument to specify the node IP If not, the command will use the first influxdb node as the default node.

Example: paragon insights cli influxdb --influx influxdb-172.16.x.21

paragon insights cli mgd

Logs in to the CLI of Paragon Insights mgd pod.

paragon insights describe alerta

Describes the Paragon Insights alerta pod.

paragon insights describe api

Describes the Paragon Insights REST API pod.

paragon insights describe configserver

Describes the Paragon Insights config-server pod.

paragon insights describe grafana

Describes the Paragon Insights grafana pod.

paragon insights describe influxdb

Describes the Paragon Insights influxdb pod.

Use the --influx influxdb-nodeip argument to specify the node IP. If not, the command will use the first influxdb node as the default node.

Example: paragon insights describe influxdb --influx influxdb-172.16.x.21

paragon insights describe mgd

Describes the Paragon Insights mgd pod.

paragon insights get alerta

Shows the Paragon Insights alerta pod.

paragon insights get api

Shows the Paragon Insights REST API pod.

paragon insights get configserver

Shows the Paragon Insights config-server pod.

paragon insights get devicegroups

Shows all the Paragon Insights device groups.

The default username is admin. To modify the username, run the paragon set user --cred username> command.

As a prerequisite, run the paragon set password --cred password command to set the Paragon (UI host) password.

paragon insights get devices

Shows all Paragon Insights devices.

The default username is admin. To modify the username, run the paragon set user --cred username command.

As a prerequisite, run the paragon set password --cred password command to set the Paragon (UI host) password.

paragon insights get grafana

Shows the Paragon Insights grafana pod.

paragon insights get influxdb

Shows the Paragon Insights influxdb pod.

paragon insights get ingest

Shows the Paragon Insights network telemetry ingestion pods.

paragon insights get mgd

Shows the Paragon Insights mgd pod.

paragon insights get playbooks

Shows all Paragon Insights playbooks.

The default username is admin. To modify the username, run the paragon set user --cred username command.

As a prerequisite, run the paragon set password --cred password command to set the Paragon (UI host) password.

paragon insights get pods

Shows all the Paragon Insights pods.

paragon insights get services

Shows all the Paragon Insights services.

paragon insights logs alerta

Shows the logs of the Paragon Insights alerta pod.

paragon insights logs api

Shows the logs of the Paragon Insights rest api pod.

paragon insights logs byoi

Shows the logs of the Paragon Insights BYOI plug-in.

Use the --byoi BYOI plugin name argument.

paragon insights logs configserver

Shows the logs of the Paragon Insights config-server pod.

paragon insights logs devicegroup analytical

Shows the logs of the Paragon Insights device group for service analytical engine.

Use the --dg device Group name with subgroup argument.

Example: paragon insights logs devicegroup analytical --dg controller-0

In the example, controller is the devicegroup name and 0 is the subgroup.

paragon insights logs devicegroup itsdb

Shows the logs of the Paragon Insights device group for service itsdb.

Use the --dg device Group name with subgroup argument.

Example: paragon insights logs devicegroup itsdb --dg controller-0

In the example, controller is the devicegroup name and 0 is the subgroup.

paragon insights logs devicegroup jtimon

Shows the logs of the Paragon Insights device group for service jtimon.

Use the --dg device Group name with subgroup argument.

Example: paragon insights logs devicegroup jtimon --dg controller-0

In the example, controller is the devicegroup name and 0 is the subgroup.

paragon insights logs devicegroup native

Shows the logs of the Paragon Insights device group for service jti native.

Use the --dg device Group name with subgroup argument.

Example: paragon insights logs devicegroup native --dg controller-0

In the example, controller is the devicegroup name and 0 is the subgroup.

paragon insights logs devicegroup syslog

Shows the logs of the Paragon Insights device group for service syslog.

Use the --dg device Group name with subgroup argument.

Example: paragon insights logs devicegroup syslog --dg controller-0

In the example, controller is the devicegroup name and 0 is the subgroup.

paragon insights logs grafana

Shows the logs of the Paragon Insights Grafana pod.

paragon insights logs influxdb

Shows the logs of the Paragon Insights influxdb pod.

Use the --influx influxdb-nodeip argument to specify the node IP. If not, the command will use the first influxdb node as the default node.

Example: paragon insights logs influxdb --influx influxdb-172.16.x.21

paragon insights logs mgd

Shows the logs of the Paragon Insights mgd pod.

paragon pathfinder cli bmp

Logs in to the CLI of the Paragon Pathfinder BMP container.

paragon pathfinder cli configserver

Logs in to the CLI of the Paragon Pathfinder ns-configserver container.

paragon pathfinder cli crpd

Logs in to the CLI of the Paragon Pathfinder cRPD container.

paragon pathfinder cli debugutils

Logs in to the CLI of the Paragon Pathfinder debugutils container.

paragon pathfinder cli netconf

Logs in to the CLI of the Paragon Pathfinder netconf container.

paragon pathfinder cli pceserver

Logs in to the CLI of the Paragon Pathfinder ns-pceserver container (PCEP) services.

paragon pathfinder cli pcserver

Logs in to the CLI of the Paragon Pathfinder ns-pcserver (PCS) container.

paragon pathfinder cli pcviewer

Logs in to the CLI of the Paragon Pathfinder ns-pcsviewer (Paragon Planner Desktop Application) container.

paragon pathfinder cli scheduler

Gets into the CLI of paragon pathfinder scheduler container.

paragon pathfinder cli toposerver

Logs into the CLI of the Paragon Pathfinder ns-toposerver (Topology service) container.

paragon pathfinder cli web

Logs into the CLI of the Paragon Pathfinder ns-web container.

paragon pathfinder debug bgpls config

Debugs the Paragon Pathfinder cRPD routing-options configuration related to BGP-LS.

paragon pathfinder debug bgpls routes

Debugs the Paragon Pathfinder cRPD routes related to BGP-LS.

paragon pathfinder debug genjvisiondata help

Shows Paragon Pathfinder debugutils genjvisiondata help.

paragon pathfinder debug genjvisiondata params

Shows Paragon Pathfinder debugutils genjvisiondata params.

paragon pathfinder debug lsp

Logs in to the Paragon Pathfinder PCEP CLI for debugging.

paragon pathfinder debug postgres status

Shows the Kubernetes cluster Postgres status.

paragon pathfinder debug rabbitmq status

Shows the rabbitmqctl cluster status.

paragon pathfinder debug snoop amqp

Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between AMQP.

paragon pathfinder debug snoop help

Shows Paragon Pathfinder debugutils snoop help.

paragon pathfinder debug snoop postgres

Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Postgres.

paragon pathfinder debug snoop redis link

Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Redis link.

paragon pathfinder debug snoop redis lsp

Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between Redis lsp.

paragon pathfinder debug snoop redis node

Runs Paragon Pathfinder debugutils pod to snoop and decode data exchanged between redis nodes.

paragon pathfinder debug topoutil help

Shows Paragon Pathfinder debugutils topo_util help.

paragon pathfinder debug topoutil safemode deactivate

Shows Paragon Pathfinder debugutils topo_util tool to deactivate safe mode.

paragon pathfinder debug topoutil topo refresh

Runs Paragon Pathfinder debugutils topo_util tool to refresh the current topology.

paragon pathfinder debug topoutil topo save

Runs Paragon Pathfinder debugutils topo_util tool to save the current topology snapshot.

paragon pathfinder describe bmp

Describes Paragon Pathfinder pod including cRPD and BMP containers.

paragon pathfinder describe configserver

Describes Paragon Pathfinder pod including config-server container.

paragon pathfinder describe debugutils

Describes Paragon Pathfinder pod including debugutils container.

paragon pathfinder describe netconf

Describes Paragon Pathfinder pod including ns-netconfd container.

paragon pathfinder describe pceserver

Describes Paragon Pathfinder pod including ns-pceserver container (PCEP services).

paragon pathfinder describe pcserver

Describes Paragon Pathfinder pod including ns-pcserver container (PCS).

paragon pathfinder describe pcviewer

Describes paragon pathfinder pod including ns-pcsviewer container (Paragon Planner Desktop Application).

paragon pathfinder describe scheduler

Describes Paragon Pathfinder pod including scheduler container.

paragon pathfinder describe toposerver

Describes Paragon Pathfinder pod including ns-toposerver (Topology service) container.

paragon pathfinder describe web

Describes Paragon Pathfinder pod including web container.

paragon pathfinder get bmp

Shows Paragon Pathfinder pod including cRPD and BMP containers.

paragon pathfinder get configserver

Shows Paragon Pathfinder pod including ns-configserver and syslog containers.

paragon pathfinder get debugutils

Shows Paragon Pathfinder pod including debugutils container.

paragon pathfinder get netconf

Shows Paragon Pathfinder pod associated with the netconf process.

paragon pathfinder get pceserver

Shows Paragon Pathfinder pod including ns-pceserver container (PCEP services).

paragon pathfinder get pcserver

Shows Paragon Pathfinder pod including ns-pcserver container (PCS).

paragon pathfinder get pcviewer

Shows Paragon Pathfinder pod including ns-pcsviewer container (Paragon Planner Desktop Application).

paragon pathfinder get pods

Shows all Paragon Pathfinder pods.

paragon pathfinder get scheduler

Shows Paragon Pathfinder pod associated with the scheduler process.

paragon pathfinder get services

Shows all Paragon Pathfinder services.

paragon pathfinder get toposerver

Shows Paragon Pathfinder pod including ns-toposerver container (Topology service).

paragon pathfinder get web

Shows Paragon Pathfinder pod associated with the ns-web process.

paragon pathfinder logs bmp container bmp

Shows the logs of Paragon Pathfinder bmp pods bmp container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs bmp container crpd

Shows the logs of Paragon Pathfinder bmp pods cRPD container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs bmp container syslog

Shows the logs of Paragon Pathfinder bmp pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs configserver container nsconfigserver

Shows the logs of Paragon Pathfinder configserver pods ns-configserver container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs configserver container syslog

Shows the logs of Paragon Pathfinder configserver pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs netconf container nsnetconfd

Shows the logs of Paragon Pathfinder netconf pods ns-netconfd container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs netconf container syslog

Shows the logs of Paragon Pathfinder netconf pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pceserver container nspceserver

Shows the logs of Paragon Pathfinder pceserver pods ns-pceserver container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pceserver container syslog

Shows the logs of Paragon Pathfinder pceserver pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pceserver syslog filtered

Shows processed logs of Paragon Pathfinder pceserver pods syslog container fetching only timestamp, level, and message. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pcserver container nspcserver

Shows the logs of Paragon Pathfinder pcserver pods ns-pcserver container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pcserver container syslog

Shows the logs of Paragon Pathfinder pcserver pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pcserver syslog filtered

Shows processed logs of Paragon Pathfinder pceserver pods syslog container fetching only with timestamp, level, and message. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pcviewer container nspcviewer

Shows the logs of Paragon Pathfinder pcviewer pods ns-pcviewer container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs pcviewer container syslog

Shows the logs of Paragon Pathfinder pcviewer pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs toposerver container nstopodbinit

Shows the logs of Paragon Pathfinder toposerver pods ns-topo-dbinit container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs toposerver container nstopodbinitcache

Shows the logs of Paragon Pathfinder toposerver pods ns-topo-dbinit-cache container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs toposerver container nstoposerver

Shows the logs of Paragon Pathfinder toposerver pods ns-toposerver container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs toposerver container syslog

Shows the logs of Paragon Pathfinder toposerver pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs toposerver syslog filtered

Shows processed logs of Paragon Pathfinder toposerver pods syslog container fetching only with timestamp, level, and message. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs web container nsweb

Shows the logs of Paragon Pathfinder web pods ns-web container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs web container nswebdbinit

Shows the logs of Paragon Pathfinder web pods ns-web-dbinit container. Use the --type follow argument to get live streaming logs.

paragon pathfinder logs web container syslog

Shows the logs of Paragon Pathfinder web pods syslog container. Use the --type follow argument to get live streaming logs.

paragon pathfinder rabbitmq geoha status

Shows the federation status (from rabbitmq-0 instance). GeoHa status is only available for a dual cluster setup.

paragon rookceph ceph osddf

Reports Rook and Ceph OSD file system disk space usage.

paragon rookceph ceph osdpoolstats

Shows Rook and Ceph OSD pool statistics.

paragon rookceph ceph osdstatus

Shows Rook and Ceph OSD status.

paragon rookceph ceph osdtree

Shows Rook and Ceph OSD tree.

paragon rookceph ceph osdutilization

Shows Rook and Ceph OSD utilization.

paragon rookceph ceph pgstat

Shows Rook and Ceph pg status.

paragon rookceph ceph status

Shows Rook and Ceph status.

paragon rookceph cli toolbox

Logs in to the CLI of Rook and Ceph toolbox pod.

paragon rookceph get pods

Shows Rook and Ceph pods.

paragon rookceph get services

Shows Rook and Ceph services.

paragon rookceph radosgw get period

This is RADOS gateway user administration utility which gets the period info.

paragon rookceph radosgw synch status

This is RADOS gateway user administration utility which gets the metadata sync status.

paragon set password

Sets the Paragon (UI host) password for REST calls authentication.

Use this mandatory one-time set password command to set the password using the --cred password argument.

Example: paragon set password --cred AdminXYX!

paragon set username

Sets the Paragon (UI host) username for Rest calls authentication. The default username is admin.

Use the --cred username argument to set a different username.

Example: paragon set username --cred newadmin

Troubleshoot Ceph and Rook

Ceph requires relatively newer Kernel versions. If your Linux kernel is very old, consider upgrading or reinstalling a new one.

Use this section to troubleshoot issues with Ceph and Rook.

Insufficient Disk Space

A common reason for installation failure is that the object storage daemons (OSDs) are not created. An OSD configures the storage on a cluster node. OSDs might not be created because of non-availability of disk resources, in the form of either insufficient resources or incorrectly partitioned disk space. Ensure that the nodes have sufficient unpartitioned disk space available.

Reformat a Disk

Examine the logs of the "rook-ceph-osd-prepare-hostname-*" jobs. The logs are descriptive. If you need to reformat the disk or partition, and restart Rook, perform the following steps:

  1. Use one of the following methods to reformat an existing disk or partition.
    • If you have a block storage device that should have been used for Ceph, but wasn't used because it was in an unusable state, you can reformat the disk completely.
    • If you have a disk partition that should have been used for Ceph, you can clear the data on the partition completely.
    Note:

    These commands completely reformat the disk or partitions that you are using and you will lose all data on them.

  2. Restart Rook to save the changes and reattempt the OSD creation process.

View Pod Status

To check the status of Rook and Ceph pods installed in the rook-ceph namespace, use the # kubectl get po -n rook-ceph command. The following pods must be in the running state.

  • rook-ceph-mon-*—Typically, three monitor pods are created.
  • rook-ceph-mgr-*—One manager pod
  • rook-ceph-osd-*—Three or more OSD pods
  • rook-ceph-mds-cephfs-*—Metadata servers
  • rook-ceph-rgw-object-store-*—ObjectStore gateway
  • rook-ceph-tools*—For additional debugging options.

    To connect to the toolbox, use the command:

    $ kubectl exec -ti -n rook-ceph $(kubectl get po -n rook-ceph -l app=rook-ceph-tools \ -o jsonpath={..metadata.name}) -- bash

    Some of the common commands you can use in the toolbox are:

    # ceph status # ceph osd status, # ceph osd df, # ceph osd utilization, # ceph osd pool stats, # ceph osd tree, and # ceph pg stat

Troubleshoot Ceph OSD failure

Check the status of pods installed in the rook-ceph namespace.

# kubectl get po -n rook-ceph

If a rook-ceph-osd-* pod is in the Error or CrashLoopBackoff state, then you must repair the disk.

  1. Stop the rook-ceph-operator.

    # kubectl scale deploy -n rook-ceph rook-ceph-operator --replicas=0

  2. Remove the failing OSD processes.

    # kubectl delete deploy -n rook-ceph rook-ceph-osd-number

  3. Connect to the toolbox.

    $ kubectl exec -ti -n rook-ceph $(kubectl get po -n rook-ceph -l app=rook-ceph-tools -o jsonpath={..metadata.name}) -- bash

  4. Identify the failing OSD.

    # ceph osd status

  5. Mark out the failed OSD.

  6. Remove the failed OSD.

    # ceph osd purge number --yes-i-really-mean-it

  7. Connect to the node that hosted the failed OSD and do one of the following:
    • Replace the hard disk in case of a hardware failure.
    • Reformat the disk completely.
    • Reformat the partition completely.
  8. Restart rook-ceph-operator.

    # kubectl scale deploy -n rook-ceph rook-ceph-operator --replicas=1

  9. Monitor the OSD pods.

    # kubectl get po -n rook-ceph

    If the OSD does not recover, use the same procedure to remove the OSD, and then remove the disk or delete the partition before restarting rook-ceph-operator.

Troubleshoot Air-Gap Installation Failure

The air-gap installation as well as the kube-apiserver fails with the following error because you do not have an existing /etc/resolv.conf file.

To create a new file, you must run the #touch /etc/resolv.conf command as the root user, and then redeploy the Paragon Automation cluster.

Recover from a RabbitMQ Cluster Failure

If your Paragon Automation cluster fails (for example, from a power outage), the RabbitMQ message bus may not restart properly.

To check for this condition, run the kubectl get po -n northstar -l app=rabbitmq command. This command should show three pods with their status as Running. For example:

However, if the status of one or more pods is Error, use the following recovery procedure:

  1. Delete RabbitMQ.

    kubectl delete po -n northstar -l app=rabbitmq

  2. Check the status of the pods.

    kubectl get po -n northstar -l app=rabbitmq.

    Repeat kubectl delete po -n northstar -l app=rabbitmq until the status of all pods is Running.

  3. Restart the Paragon Pathfinder applications.

Disable udevd Daemon During OSD Creation

You use the udevd daemon for managing new hardware such as disks, network cards, and CDs. During the creation of OSDs, the udevd daemon detects the OSDs and can lock them before they are fully initialized. The Paragon Automation installer disables systemd-udevd during installation and enables it after Rook has initialized the OSDs.

When adding or replacing nodes and repairing failed nodes, you must manually disable the udevd daemon so that OSD creation does not fail. You can reenable the daemon after the OSDs are created.

Use these commands to manually disable and enable udevd.

  1. Log in to the node that you want to add or repair.
  2. Disable the udevd daemon.
    1. Check whether udevd is running.

      # systemctl is-active systemd-udevd

    2. If udevd is active, disable it. # systemctl mask system-udevd --now
  3. When you repair or replace a node, the Ceph distributed filesystems are not automatically updated. If the data disks are destroyed as part of the repair process, then you must recover the object storage daemons (OSDs) hosted on those data disks.

    1. Connect to the Ceph toolbox and view the status of OSDs. The ceph-tools script is installed on a primary node. You can log in to the primary node and use the kubectl interface to access ceph-tools. To use a node other than the primary node, you must copy the admin.conf file (in the config-dir directory on the control host) and set the kubeconfig environment variable or use the export KUBECONFIG=config-dir/admin.conf command.

      $ ceph-tools# ceph osd status

    2. Verify that all OSDs are listed as exists,up. If OSDs are damaged, follow the troubleshooting instructions explained in Troubleshoot Ceph and Rook.

  4. Log in to node that you added or repaired after verifying that all OSDs are created.
  5. Reenable udevd on the node.

    systemctl unmask system-udevd

Alternatively, you can set disable_udevd: true in the config.yml and run the ./run -c config-dir deploy command. We do not recommend that you redeploy the cluster only to disable the udevd daemon.

Wrapper Scripts for Common Utility Commands

You can use the following wrapper scripts installed in /usr/local/bin to connect to and run commands on pods running in the system.
Command Description
paragon-db [arguments] Connect to the database server and start the Postgres SQL shell using the superuser account. Optional arguments are passed to the Postgres SQL command.
pf-cmgd [arguments] Start the CLI in the Paragon Pathfinder CMGD pod. Optional arguments are executed by the CLI.
pf-crpd [arguments] Start the CLI in the Paragon Pathfinder cRPD pod. Optional arguments are executed by the CLI.
pf-redis [arguments] Start the (authenticated) redis-cli in the Paragon Pathfinder Redis pod. Optional arguments are executed by the Redis pod.
pf-debugutils [arguments] Start the shell in the Paragon Pathfinder debugutils pod. Optional arguments are executed by the shell. Pathfinder debugutils utilities are installed if install_northstar_debugutils: true is configured in the config.yml file.
ceph-tools [arguments] Start the shell to the Ceph toolbox. Optional arguments are executed by the shell.

Back Up the Control Host

If your control host fails, you must back up the config-dir directory to a remote location to be able to rebuild your cluster . The config-dir contains the inventory, config.yml, and id_rsa files.

Alternatively, you can also rebuild the inventory and config.yml files by downloading information from the cluster using the following commands:

# kubectl get cm -n common metadata -o jsonpath={..inventory} > inventory

# kubectl get cm -n common metadata -o jsonpath={..config_yml} > config.yml

You cannot recover SSH keys; you must replace failed keys with new keys.

User Service Accounts for Debugging

Paragon Pathfinder, telemetry manager, and base platform applications internally use Paragon Insights for telemetry collection. To debug configuration issues associated with these applications, three user service accounts are created, by default, during Paragon Automation installation. The scope of these service accounts is limited to debugging the corresponding application only. The service accounts details are listed in the following table.

Table 2: Service Account Details
Application Name and Scope Account Username Account Default Password
Paragon Pathfinder (northstar) hb-northstar-admin Admin123!
Telemetry manager (tm) hb-tm-admin
Base platform (ems-dmon) hb-ems-dmon

You must use these accounts solely for debugging purposes. Do not use these accounts for day-to-day operations or for modifying any configuration. We recommend that you change the login credentials for security reasons.