Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Running a Command – Your First Playbook

This chapter begins to explore Ansible, playbooks, and related files. In it, you write a short playbook that executes a command on Junos devices and displays the command’s results. You learn about the structure and contents of a playbook, how to prompt for input, how to display output, one way to send a command to a Junos device, and a little about debugging playbooks.

The author is using a Virtual SRX named aragorn and an EX-2200-C named bilbo for most examples in this book. You can use any Junos devices you have available, preferably lab or test devices. The book uses only two Ansible-managed devices in order to keep the example output short, but if you have more devices you can run the playbook against all of them if you wish. The author routinely runs his production playbooks against hundreds of devices.

The (manual) Command

Assume you need to find out the date that each network device was booted or last configured, so you can confirm that devices have not been configured or rebooted since the last scheduled maintenance window.

The Junos CLI command for this is “show system uptime” and, when run manually, it will look something like this (the exact output differs based on device hardware, configuration, and uptime):

The remainder of this chapter shows you how to create an Ansible playbook to run this command across several devices and report the results. We’re going to discuss a lot of Ansible fundamentals along the way, so it will be a few pages before we actually start gathering uptime information. Don’t worry, we’ll get there.

Playbook Directory and Files

You should create a subdirectory to hold your Ansible playbooks and related files, and you should change to that directory before running a playbook contained there. This book assumes you are using subdirectory aja2 in your home directory:

In order to run a basic Ansible playbook you need three files:

  • The Ansible configuration file, ansible.cfg.

  • The inventory file, which we name inventory, that contains the list of devices that Ansible might access or manage.

  • The playbook file containing the Ansible playbook in YAML format. For this example we name the playbook uptime.yaml.

In future chapters, we will build on this set of files, but these three will suffice for the example in this chapter.

File: ansible.cfg

Let’s start with the Ansible configuration file. There are many configuration settings that can be placed in this file, but two settings will suffice for now. Create file ansible.cfg in your ~/aja2 directory and enter following lines in the file:

The line inventory = inventory tells Ansible to look in the file inventory (in the current directory) for the list of devices that Ansible will manage.

The line host_key_checking = False tells Ansible that it should not use SSH host key checking1 (1: When connecting manually with SSH, the OpenSSH client confirms that the server’s ID matches the ID cached in the user’s ~/.ssh/known_hosts file. If there is no entry in known_hosts then SSH will ask the user to confirm that the server’s ID is valid and that the connection should proceed. If the cached ID is different from the ID provided by the server, the client displays an error and aborts the connection.). Host key checking is desirable from a security perspective but can be a problem with automated connections. Disabling Ansible’s host key checking allows Ansible to connect even if the server’s ID is not in the known_hosts file (for example, if you have not previously manually connected to that device and cached its ID) or does not match the cached value in known_hosts (as can happen, for example, after a routing engine failover). Ansible 2.4 enables host key checking by default, but it was disabled by default in earlier versions; if you are using an earlier version of Ansible, you may be able to omit this setting.

File: inventory

Ansible needs to have an inventory, a list of devices it should work with. There are a few ways of arranging an inventory, but the easiest is to create a single text file.

Inventory data must include a name for each managed device, which will be available to the playbook in a variable called inventory_hostname. (The author likes to use the device’s hostname for the inventory name, but that is not a requirement.) Inventory can define groups of devices, a topic we will explore in Data Files and Inventory Groups chapter.

Ansible’s default is to use the file /etc/ansible/hosts for inventory data. The author prefers to have the inventory in the directory with the playbook(s) that use it. This keeps all related files together, makes it easier to have different inventory for different playbooks (discussed in Data Files and Inventory Groups chapter), and makes it easier to keep the inventory in source control with the playbooks (see the Appendix).

Create a file called inventory in your ~/aja2 directory and add a single line for each test device (your names may be different from what is shown here, and you may use fully qualified names if needed, such as bilbo.mycompany.com):

Inventory can also include variables, which define additional data about the device. For example, if your playbook needs to know the role of a device in the network (is an EX or QFX acting only as a Layer 2 switch, or does it have Layer 3 interfaces and routing features enabled?) you can define a variable to hold that information, such as device_role=router. Though defining variables in the inventory file is supported, and we will do so for some of our early playbooks, it is not recommended – it can be difficult to manage as the number of devices and variables increases. We will explore a more scalable approach in Data Files and Inventory Groups chapter.

Two variables that are often useful, and which have special meaning to Ansible, are ansible_host and inventory_hostname.

The inventory_hostname variable contains the name of the host as specified in the inventory file. This is often useful within playbooks; for example, if a playbook saves a file related to a host, you may use inventory_hostname in the filename so it is clear to which host the file relates. The inventory_hostname variable is also often used to specify the device to be managed by the playbook, but this only works correctly if name resolution works on that name. In other words, Ansible needs to be able to resolve the author’s device names bilbo and aragorn (from the inventory above) into their respective IP addresses in order to establish the SSH sessions to those devices. If you cannot rely on name resolution, as might be the case with new devices not yet added to DNS, or when setting up a new office that does not yet have connectivity back to corporate, you need an alternative. That alternative is the ansible_host variable.

If we do not provide a value for ansible_host, Ansible automatically populates ansible_host with the inventory name, but instead of relying on that fact we populate ansible_host with the IP address of the target host. As we create our playbook, we use the ansible_host variable to specify the managed device (we will see this shortly).

Note:

Ansible versions prior to 2.0 used the name ansible_ssh_host for the same variable. If you are using an older Ansible version, you should use the longer name.

An inventory file with variables looks something like this:

Please update your inventory file to include an ansible_host variable and appropriate IP address for at least one of your test devices.

File: uptime.yaml

Our first playbook is called uptime.yaml and will, when completed, gather and display the device uptime from our network devices. We will build the playbook in several steps, explaining as we go.

Playbooks are Ansible’s “scripts,” describing a series of tasks that will be performed on or by various hosts or devices. Playbooks contain plays; plays contain tasks; tasks call Ansible modules to carry out operations.

Play: Playbooks consist of one or more plays. Each play defines a set of hosts or devices on which the play will run, and one or more tasks to be performed on each of those hosts. Plays may also declare variables or include other features needed for the tasks in the play. If a playbook contains multiple plays then the tasks within the different plays probably have different requirements, such as a different set of hosts or devices.

Tasks: A task is a specific command to be executed. Tasks specify the Ansible module (the command) to execute. Tasks usually include arguments that provide additional details about how the module should run, such as the network device to control, or the username and password for connecting to the device.

Before we create and run the playbook, we need to discuss one other topic.

Path to the Python Interpreter

In Installing Ansible chapter, the author suggested that macOS users install Homebrew and install Python and Ansible with the Homebrew environment. There is a downside to this approach; it changes the path to the Python interpreter and any user-installed Python libraries, including Ansible and PyEZ.

Check to see where the active Python interpreter is located. From your system shell, enter the command which python:

On most UNIX-type systems, the default Python interpreter is /usr/bin/python. Ansible assumes this will be the case and relies on that interpreter being present. If the active Python interpreter is different, Ansible may be unable to find user-installed Python libraries.

The author is using Homebrew, and you can see above that his Python interpreter is /usr/local/bin/python, not /usr/bin/python. The playbook in the next section will fail on the author’s system unless Ansible is told where to find the active Python interpreter.

If your Ansible environment contains a variable called ansible_python_interpreter, Ansible will read from that variable the path to the Python interpreter instead of using the default. There are a number of places where this variable could be set; one option is to put the variable setting in the inventory file.

If your which python command returned a path other than /usr/bin/python, append the following boldfaced lines to your inventory file (use the correct path for your system, as it may be different than the author’s system):

The [all:vars] line introduces a new section in the inventory file containing variables that apply to all hosts. The next line sets the ansible_python_interpreter variable to the correct path for your system (copy whatever which python returned).

Tip:

On most UNIX-type systems, including MacOS, you can use a trick to avoid worrying about a system-specific path to the Python interpreter. Instead of setting the ansible_python_interpreter variable to the actual path to the interpreter, set it to /usr/bin/env python (note the space before “python”). This essentially tells the operating system to use the env command to find the python interpreter based on the system’s path.

Uptime Version 1.0

Create file uptime.yaml in your ~/aja2 directory and enter the following:

Remember this is a YAML file and thus indentation is important. The following screen capture of the author’s text editor shows the same playbook with a dot (.) representing each space, so you can easily see the amount of indentation for each line. The screen capture also shows line numbers to make it a bit easier to discuss the playbook’s contents (do not enter the line numbers in your file), and “¬” for line endings (new-line characters).

Let’s talk about this playbook and what each line does. Reference the line numbers shown in the screen capture.

Line 1: YAML documents start with ---.

Line 2: The name: line identifies the first play in the playbook. The name is not normally significant to Ansible, but it helps document what is happening both to the engineer editing the playbook itself, and during playbook execution (we will see the text “Get device uptime” in the output when we run the playbook).

The leading hyphen (“–”) means this line (and all subsequent lines until another leading hyphen with the same indentation) is an element in a list, in this case the list of plays within the playbook. This simple playbook has only one play; there is no subsequent line with a leading hyphen with the same indentation, which in this case is no indentation (the hypen is on the left margin).

Lines 3-4: Declare the hosts or devices against which the playbook will run. The keyword all here is a default Ansible group that automatically includes all devices in inventory. This is an array, so you can specify multiple devices from inventory; for example, your playbook could say:

Because hosts: is indented at the same level as name: on the previous line, it is part of the same dictionary, which is defining the first play.

Line 5: Ansible was originally built to work with servers and assumes that each managed server can execute Python scripts; the host running Ansible (the control machine) would convert a play into a Python script, upload the script to the managed server, tell the server to run the script, and accept the results from the server. This approach will not work with network devices.

To manage network devices, we need Ansible to run everything locally (on the control machine). The line connection: local tells Ansible that it cannot upload a Python script to the managed device; instead, it needs to run the playbook locally on the Ansible control machine (even though modules called by the playbook may connect to a network device in order to control it in some fashion).

Line 6: When managing servers, Ansible normally gathers facts—such as operating system, version, IP addresses, and more—from each server. This does not work the same way with network devices because the tasks are running locally (per line 5), which means any facts gathered would be for the host running Ansible, not for the network device. The line gather_facts: no overrides the default behavior; it tells Ansible to not spend time gathering facts we do not need. (Later in the book we provide examples where fact gathering is useful.)

Lines 7, 10: Blank lines are ignored by Ansible, but help humans see “sections” within the playbook.

Line 8: The tasks: line introduces a list of one or more tasks to be executed. Despite the blank line above, this is part of the same play (the same dictionary) as lines 2, 3, 5, and 6, because the indentation is the same and there has been no (non-blank) line between with less indentation.

Line 9: The first task (note the leading hyphen – indicating this is a list element). This task calls Ansible module debug, which prints information to screen during playbook execution. The argument var=inventory_hostname tells debug it should print the contents of the variable (var) called inventory_hostname.

Lines 11-12: Another task calling the debug module to print a variable’s contents, but showing another way to provide arguments to a module. This task asks debug to print the contents of variable ansible_host. Note that the argument var: ansible_host is indented.

Let’s run the playbook and see what happens. The author’s inventory file contains the following lines. Your hostnames and IP addresses may be different, and remember that the ansible_python_interpreter variable is needed only if your system’s Python interpreter is in a non-standard location:

The command ansible-playbook tells Ansible to execute the playbook whose name is provided on the command line. Be sure you are in your ~/aja2 directory (where your playbook and inventory files are located) then run the playbook:

Let’s discuss the output from the playbook:

PLAY [Get device uptime] indicates the playbook is starting the play called “Get device uptime.” Note that “Get device uptime” is the value of the name: entry in the play in the playbook; names are usually optional to Ansible but are helpful to humans!

TASK [debug] indicates the playbook is starting a task. Our playbook did not provide names for the tasks, so Ansible displays the module name debug instead.

ok: [aragorn] and ok: [bilbo] indicate that the task completed successfully for each device. Because the task is debug, which prints information to screen, the output also includes JSON-formatted data showing the value of the requested variable.

The PLAY RECAP section shows a summary of the playbook: for each device, how many tasks completed “ok” (successfully without changing anything), completed “changed” (changed something), or did not complete at all because the target was unreachable or there was another failure.

If your terminal shows color, some of the output should have been in green, similar to the following screenshot. Green is good. Tasks that return an ok status will display in green, and in the Play Recap section, devices for which all tasks returned ok will be in green:

As you look at the output, you can see that Ansible runs each task on each device specified in the play. Typically, one task must finish for all devices before Ansible will start the next task, and one play must finish before Ansible will start the next play.

The first TASK [debug] displayed the contents of the inventory_hostname variable for each device, which is simply the name for the device given in the inventory file. Each device has a separate set of variables, and different devices will have variables of the same name but containing different data.

The second TASK [debug] displayed the ansible_host variable for each device. This output is interesting because aragorn has an IP address, while bilbo has a hostname. This difference is because of the author’s inventory file, which contains:

The inventory line for device aragorn assigns a value, an IP address, to the ansible_host variable for the device, and we get that IP address in the playbook’s output. Device bilbo does not provide a value for ansible_host, so Ansible sets it to the same value as inventory_hostname automatically.

If your debug output includes “VARIABLE IS NOT DEFINED!” instead of a value, check the spelling of the appropriate variable name. The following output illustrates the result of misspelling the ansible_host variable in the playbook (second task):

Uptime Version 1.1

Our uptime.yaml playbook runs and displays output, but does not yet communicate with our device to gather any data. Let’s fix that!

We use the Galaxy module juniper_junos_command to communicate with our devices and execute the “show system uptime” command. This module needs several arguments: the command to execute, the device to communicate with, and credentials for authenticating with the device.

Because we need to authenticate with the devices, our playbook must have a username and password. It is poor practice to code those into the playbook; instead, our playbook will prompt for input (ask the user to provide that data).

Modify uptime.yaml so it looks like the following:

A screen capture showing the same playbook with line numbers, etc:

Again, let’s discuss the playbook’s contents by line number, focusing on the new or changed lines.

Lines 5-6: Include the Juniper.junos Galaxy modules, which enable Ansible to communicate with Junos devices. Before we can use Galaxy modules, we need to import their parent role into the play. A roles: list tells Ansible to include the functions in the specified roles. We discuss roles in detail in Roles chapter.

Line 10: The vars_prompt: line introduces a list, each element of which is a dictionary, that tells Ansible to prompt the user for input and assign that input to specific variables. These variables are associated with the play, not a device, and are available to all devices in the play.

Variable names should start with a letter and can contain letters, numerals, and the underscore (“_”) character. Valid variable names include my_data and Results1; invalid variable names include 2day (starts with a numeral) and task-results (contains a hyphen). Variable names are case sensitive: test1 and Test1 are different variables.

Lines 11-13: The first dictionary in the vars_prompt list. Line 11 tells Ansible to put the user’s input in a variable named username. Line 12 tells Ansible to display “Junos Username” as the prompt for input. Line 13 says the input is not private (the user will be able to see what they type).

Lines 15-17: The second dictionary in vars_prompt list. This time the input is stored in a variable called password and is private, meaning Ansible will not display what the user is typing.

Lines 20-28: Define a task named “get uptime using galaxy module” that calls the Ansible module juniper_junos_command. This task passes two arguments to juniper_junos_command. The first argument is commands, which is a list of Junos commands to execute (our playbook has only one element in the list); the second argument is provider, which is a dictionary that describes how to access the target device.

The provider dictionary (lines 24-28) has four entries (key:value pairs):

host (line 25) specifies the device on which Ansible should execute the commands; this is assigned the value of the device’s ansible_host variable.

port (line 26) specifies the TCP port that Ansible should use for the connection; we specify the standard SSH port 22. We discuss the connection further in Junos, RPC, NETCONF, and XML chapter.

user and passwd (lines 27 and 28) are the credentials for accessing the device; these are assigned the values provided by the user in the vars_prompt portion of the playbook via the username and password variables.

As you can see in lines 25, 27, and 28, Ansible uses {{ variable_name }} to say “put the value of variable variable_ name here.” However, YAML considers { } to be a dictionary, which would result in an error because {{ variable_name }} is not a valid dictionary. To make YAML happy, we need to include the variable reference in quotes – "{{ }}" – so YAML sees it as a string, leaving interpretation of the variable to Ansible.

Let’s run the playbook!

Ansible says it worked...but where are the uptimes?

Note:

Despite the fact that the juniper_junos_command module accepts a Junos CLI command, it does not connect to a Junos device’s CLI and run the command the way you would. It actually calls the Junos API (application programming interface), specifically an RPC (remote procedure call) called <command> that “converts” the CLI command to the correct RPC API call. Junos, RPC, NETCONF, and XML chapter introduces RPCs in more detail.

Uptime Version 1.2

Previously we saw that we can use the debug module to display the contents of a variable, but what variable contains the devices’ uptimes?

As the playbook is currently constructed, the uptime values are lost. We need to assign the results of the juniper_junos_command module to a variable, which we can do by adding register: uptime to that task, where uptime is the name of the variable in which the task’s output will be stored:

Note that the register argument is indented to the same level as the task’s name and module (juniper_junos_command). Register is an argument for the task itself and needs to be at the same indentation level as other task entries. By contrast, commands and provider are arguments to the module juniper_junos_command, which is why they are indented further than the module’s name.

We also need to add a task that calls the debug module to display the contents of the new uptime variable. This time let’s give the debug task a name:

The complete modified playbook (lines 29–33 were added):

Let’s run the playbook:

Notice that the debug task now has a name: TASK [display uptimes].

Notice that the output of the display uptimes task is in JSON format, and that each device has its own uptime variable, registered (created) by the “get uptime using galaxy module” task. The uptime variables each contain a dictionary with a number of values, including the following:

stdout – the complete Junos command output as a single string.

stdout_lines – a list (array) where each element is one line of Junos command output.

changed – Boolean true if the module changed something (for example, the device’s configuration), false otherwise.

failed – Boolean true if the module reported a failure, false otherwise.

format – the type of output, where “text” indicates human-readable text as seen at the Junos CLI. May also be “xml” or “json” – we discuss those options in Junos, RPC, NETCONF, and XML chapter.

Note:

Notice that the commands argument on line 22 seems to introduce a list of commands (note the leading hyphen on the command “- show system uptime”), though our example has only a single command in the list. The juniper_junos_command module supports executing multiple commands in one task (one call to the module). After you understand the use of this module with a single command, you may wish to read the online documentation to understand how the module handles a list of commands and their results.

Uptime Version 1.3

We do not need the Junos command’s output twice. Can we have debug display just the stdout_lines part of the uptime dictionary? Yes, we can, by referencing just that element of the uptime dictionary.

The standard approach to reference a specific dictionary entry, very like what a Python programmer would do, is to put the key for the desired dictionary entry in square brackets after the variable name. Because the key is a string it needs to be quoted:

Note:

You can use single-quotes (' ') or double-quotes (" ") around the key string.

Ansible supports a shortcut, however: use a period to join the variable name and the key for the desired dictionary entry:

The modified playbook is shown below. Both approaches discussed above are shown (see lines 33 and 37) but the first approach is commented out: any line whose first non-space character is a hash or pound symbol – ‘#’ – is a comment and is ignored by Ansible. Comments are normally used to include notes about the playbook within the playbook itself, as documentation for anyone editing the playbook, but can also be used as shown here to disable (usually temporarily) specific lines of the playbook:

Run the playbook again. This time the output for the “display uptimes” task should look something like the following; notice how much shorter this is, while still providing the information we wanted:

Uncomment lines 31–33 (delete the leading ‘#’ and the space after it) and comment out lines 35–37 (add a leading ‘#’). Run the playbook again. The output should be essentially the same, demonstrating that the two approaches for referencing entries in a dictionary are equivalent.

You can take this a step further, displaying a specific element from the uptime.stdout_lines list, by appending the index of the element in square brackets. The first list element has an index of 0, the next an index of 1, etc. So, for example, uptime.stdout_lines[2] would reference element at index 2 (the third element) of the list. With some commands, which have very consistent output across different device types, this may give us exactly what we want.

Unfortunately, the output of this command on different devices puts similar information in different indexes of the list. For example, if we specify we want only element 5 by modifying the playbook as follows:

We get output similar to the following:

Observe that we get the “Last configured” information for aragorn, but the “Protocols started” information for bilbo. We discuss two different approaches in Junos, RPC, NETCONF, and XML chapter that will let us get the data we want despite output differences.

Errors During Playbook Execution

What happens when problems occur during playbook execution? For purposes of this section we are focusing on problems external to the playbook, such as unreachable devices or authentication errors, not syntax or other errors within the playbook.

Ansible tracks errors separately for each device. When an error related to a particular device occurs, Ansible stops processing that device; subsequent tasks will not execute for it. However, if other devices have not had errors, tasks for those devices may be executed.

Tip:

It is possible, and occasionally useful, to have Ansible ignore errors in a particular task and continue processing a device despite errors, by adding the argument ignore_errors: yes to the task where errors are expected.

Unreachable Device

Unplug the network cable from one of your test devices – for this example, the switch bilbo was disconnected – then run the playbook again. The output should look something like the following image:

Unreachable Device

The results for TASK [get uptime using galaxy module] show that aragorn succeeded – ok: [aragorn] – but bilbo failed – fatal: [bilbo] followed by an error message. In addition, color terminals display fatal task results in red, and also show red for that device in the PLAY RECAP section of output.

The juniper_junos_command module returned the error message “Unable to make a PyEZ connection: ConnectTimeoutError(bilbo)” seen above for bilbo. The “ConnectTimeoutError” part makes sense – because bilbo was unreachable (disconnected) any attempt to connect to bilbo would have timed out. The reference to “a PyEZ connection” illustrates that Juniper’s Galaxy modules rely on Juniper’s PyEZ connection framework.

The results for TASK [display uptimes] contains results for only aragorn. Because bilbo had an error in the previous task, Ansible stopped processing that device and thus had no results for bilbo for subsequent tasks. You can see this in the PLAY RECAP section – bilbo has only one (failed) task, while aragorn has two (ok) tasks.

Authentication Error

Re-connect your network device and give it a moment to restore communication; then run the playbook again. This time, enter invalid credentials at the username and password prompts:

Authentication Error

Note that the results for TASK [get uptime using galaxy module] show both devices failed: fatal: [aragorn] and fatal: [bilbo], each followed by the error message “Unable to make a PyEZ connection: ConnectAuthError.” The “ConnectAuthError” part of the message shows we had an authentication failure.

Notice that TASK [display uptimes] never executed (it does not appear in the output). Because there were no devices without errors after the first task, there were no devices against which to execute the second task.

Limiting Devices

It is often desirable to run a playbook against a subset of the devices in inventory. For example, your inventory for your production network may contain hundreds of devices across dozens of physical locations, but you want to run the playbook against only the Boston devices.

One approach to doing this is to edit the hosts: list in the playbook itself, replacing the default group all with one or more devices:

The problem with this approach is that it requires updating the playbook to change the devices being managed, and doing so in a way that may not be obvious to someone else who uses the playbook and expects it to work on all, or a different subset of, your devices. Also, if a playbook contains multiple plays affecting the devices, you would need to make a similar update in each play.

A better approach is to leave the playbook alone, with hosts: set to – all, and use the --limit command line argument to tell Ansible to run against limited set of devices:

Notice how Ansible ran against only aragorn, not against bilbo, even though both devices are in inventory.

The --limit argument can accept multiple inventory names separated with commas, and can accept the wildcards ‘*’ (match zero or more characters) and ‘?’ (match any single character). In Data Files and Inventory Groups chapter, we discuss inventory groups; --limit can accept group names also. Two command-line examples:

Note that when using a wildcard, you should enclose the limit value in quotes to prevent the shell from attempting to interpret the wildcard. When using --limit, it is sometimes helpful to verify which devices Ansible will manage before running the playbook (and possibly missing devices or managing some you did not expect). You can do this by adding the --list-hosts argument; this argument causes Ansible to display which devices it will manage, but not actually run the playbook. For example:

Repeating a Playbook for Devices with Errors

When a playbook encounters an error for a device during a task, it records that device in a “retry” file, a file whose name matches the playbook but with the extension .retry instead of .yaml. By default, “retry” files are stored in the playbook directory.

Tip:

You can change the directory where Ansible saves “retry” files by adding the option retry_files_save_path to the [defaults] section of the ansible.cfg file.

Earlier in this chapter we forced some errors using the uptime.yaml playbook, so you should have an uptime.retry file:

Disconnect one or more of your devices – the author disconnected bilbo – and rerun the uptime.yaml playbook. Display the contents of uptime.retry:

Observe that the uptime.retry file lists the inventory_hostname for the device, bilbo, which recorded a fatal result for any task.

Re-connect your test device(s).

How can we use the “retry” file? We can re-run the playbook for only the failed devices. To do this we use the --limit option and reference the “retry” file, prefixing the filename with an “at sign” (“@”), like this:

Observe that only the device(s) listed in the “retry” file is (are) processed.

Ansible provides a reminder about this capability in the playbook output – look back at the output with the failure on bilbo and note the line, just before the Play Recap section, that says “to retry, use: --limit @/Users/sean/aja2/ uptime.retry.”

With only one failed device out of only two test devices, it would be easy to just specify --limit=bilbo for the repeat run. Consider, however, what it would be like when running a playbook against 100 devices, a dozen of which fail and need to be retried. Referencing a single .retry file is much faster and less error-prone than manually finding and “--limiting” the failed devices in a long list of results.

Debugging Playbooks

Debugging a playbook is part skill and part art. This section provides a few tips that can help, but practice and experience are the best teachers. Google or Bing are often a big help also.

Syntax and Semantic Errors

A syntax error is when the “grammar” of the playbook is incorrect; for example, a colon (:) is missing or the indentation of a line is incorrect. A semantic error is when the syntax is valid, but something still does not make sense; for example, the playbook tries to read the value of a variable that has not yet been assigned a value.

Usually, syntax errors will be detected and reported, and the playbook will abruptly terminate. Semantic errors may or may not be detected and reported; sometimes the playbook will complete but the results will not be what you expected.

Let’s introduce a couple of errors into the playbook. Introduce each of the following errors one at a time, and reverse each before proceeding to the next one. The line numbers refer to the screen capture of uptime.yaml 1.3 from earlier in this chapter.

Missing Hosts

Delete the hosts: dictionary, lines 3 and 4, from the playbook. When you run the playbook, you should get something like this:

The error message “the field 'hosts' is required but was not set” exactly describes the problem we introduced.

Incorrect Indentation

Remove two spaces from the beginning of lines 15-17, the password prompt. The vars_prompt section of the playbook should look like this:

When you run the playbook, you should get something like this:

Recall that playbooks are YAML files, and YAML is sensitive to correct indentation. Because the password prompt lines were not correctly indented, Ansible detected a problem trying to load the playbook.

Unmatched (missing) Quotation Mark

Delete the quotation mark from the end of line 25; the revised line should be:

When you run the playbook, you should get something like this:

This error message is almost right: the problem is missing quotes, but the error message identifies line 27 as the likely problem, not line 25. Ansible cannot always identify the exact location of a syntax error, even if it correctly identifies the nature of the error.

In the author’s experience, Ansible’s error messages are usually pretty good. Please keep in mind that the exact wording of error messages may change in different versions of Ansible, so what you see when you perform these examples might be a bit different from what is shown above.

Verbose Mode

Ansible offers a “verbose mode” when running a playbook that provides more information about what is happening. To enable verbose mode, add the command-line argument –v to the ansible-playbook command, like this:

Observe the additional details provided by verbose mode (–v), including the name of the config file and the data returned by the juniper_junos_command module.

You can get still more detail by using –vv or –vvv; each “v” adds a little more “verbosity” to the playbook output.

Verbosity Argument to Debug Module

Recall that our original version of the uptime.yaml playbook used the debug module to display the values of the ansible_host and inventory_hostname variables. We removed those calls to debug because we really did not need them, but it might be nice to have the value ansible_host displayed during any future troubleshooting because we pass that value to the juniper_junos_command module. However, if we add the debug calls back in the way we had them originally, the playbook would always display the variable’s contents, even when we were not troubleshooting, which means most of the time we would be getting information we do not need.

We can ask debug to display the variable’s data only when we have enabled verbose mode as described above. Add lines 20-24 shown in the screen capture below into your uptime.yaml playbook:

Verbosity Argument to Debug Module

The number given with the verbosity argument specifies the minimum number of “v” that needs to be specified when enabling verbose mode before the debug module will print the variable’s value: verbosity: 1 displays the value with -v, -vv or -vvv, while verbosity: 3 displays the value only with –vvv.

When you run the playbook with (–v) Ansible will run the task and display the ansible_host variables:

However, when you run the playbook without enabling verbose mode (no –v argument), Ansible will skip that task for each host:

Logging

Ansible can log the results of playbooks to a log file, including some information that is not displayed on screen. You can enable this feature by adding the log_path parameter to the ansible.cfg file, like this (adjust the path and filename as needed):

Now when you run the playbook (not shown), Ansible will log the playbook’s output and some additional details:

2018-02-27 12:19:47,760 p=54882 u=sean | PLAY RECAP *********************************************************************

2018-02-27 12:19:47,761 p=54882 u=sean | aragorn : ok=2 changed=0 unreachable=0 failed=0

2018-02-27 12:19:47,761 p=54882 u=sean | bilbo : ok=2 changed=0 unreachable=0 failed=0

CAUTION:

Ansible does not automatically clear or delete the log file, which means that over time it can grow quite large. Consider enabling logging only when needed to troubleshoot a problem or remember to delete the file occasionally.