Ansible Error handling – In this lesson, you will learn the different ways of how to handle failures in Ansible tasks by using the ignore_errors, force_handlers, Ansible blocks, Ansible rescue, and Ansible always directives in a playbook.
Contents
- How Can You Handle Error In Ansible
- Specifying Task Failure Conditions
- Managing Changed Status
- Using Ansible Blocks
- Using Ansible Blocks With Rescue and Always Statement
How Can Error Handling Be Done In Ansible
Ansible plays and tasks are executed in the order they are defined in a playbook, and by default, if a task fails, the other tasks will not be executed in that order. However, this behavior can be changed with the use of a keyword called “ignore_errors: true“.
This keyword can be added to a play or a task as the case may be. If it is added to a play, it means that all the errors in the tasks associated to a play will be ignored. More so, if it is added to a task, it means all the errors in the task will be ignored.
Well, we learnt about handlers in one of our previous lessons, what about handlers? Yes, this is also applicable to handlers, handlers error can be handled by the keyword, “force_handlers:yes”.
If a task that is supposed to notify a handler fails, the handlers will not be executed. This behavior can also be changed by using the keyword, “force_handler: yes” .
As usual, let’s understand better with examples.
If we were to install the httpd package and the autofs package using a playbook, the name argument will consist of the “httpd” value and the “autofs” value.
Now, we are going to write a playbook, and we will intentionally make an error by making the name argument of autofs containing “autos“.
1. Create a playbook
[lisa@drdev1 ~]$ vim playbook7.yml
- name: Install basic package
hosts: hqdev1.tekneed.com
tasks:
- name: install autofs
yum:
name: autos
state: present
ignore_errors: true
- name: install httpd
yum:
name: httpd
state: present
2. Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook7.yml
PLAY [Install basic package] ***********************************************************
........
This playbook we run should have resulted in an error or task failure, and the second task shouldn’t have run because the first task failed, but because we used the “ignore_errors” keyword, the task did not fail and the play run.
Also, remember that you can choose to use the “ignore_errors” keyword at a play level or at a task level. In our case, it was used at the task level.
Again, let’s see an example of how we can use the force_handlers keyword to forcefully run a task with handlers.
1. create a playbook
- name: restarting httpd using handlers
hosts: hqdev1.tekneed.com
force_handlers: yes
tasks:
- name: restart httpd
service:
name: httpd
state: restarted
notify: restart httpd
handlers:
- name: restart httpd
service:
name: httpd
state: restarted
The force_handler directive will always force handlers to run whether it is called or not. Note that the keyword can only be used at a play level
2. Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook3.yml
PLAY [restarting httpd using handlers] ********************************
......
Because the force_handlers directive is set to yes, the handler will always run.
Specifying Task Failure conditions
Ansible may run a task/command successfully, however may be a failure due to the final result a user desires to get. In this sense, one can specify a condition for tasks to fail or in other words, you are at liberty to determine what a failure is.
Let’s see how this can be done by using the “failed_when” directive. The failed_when directive, from its word, simply means the task should fail when a condition is met or not met.
Let’s use the playbook below as an example,
[lisa@drdev1 ~]$ vim playbook5.yml
- name: Web page fetcher
hosts: hqdev1.tekneed.com
tasks:
- name: connect to website
uri:
url: https://tekneed.com
return_content: true
register: output
- name: verify content
debug:
msg: "verifying content"
failed_when:
- '"this content" not in output.content'
- '"other content" not in output.content'
This is what this playbook will do. The uri module will interact with the webserver and fetch the page, https://www.tekneed.com.
More so, With the true value for the return_content argument, the body of the response of https://tekneed.com will be returned as content, and output will be captured with the aid of the register directive.
For the second task, the content “verifying content” will be printed by the debug module with the aid of the msg argument, and with the “failed_when” directive, the task will fail when the string, “this content” and “other content” is not in the captured output.
Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook5.yml
PLAY [Web page fetcher] ************************************************************
.......
Alternatively, the “fail” module can be used to specify when a task fails. This module can only be very useful if the “when” keyword is used to specify the exact failure condition.
Let’s use the same example we used above, but this time around use the “fail” module and “when” keyword
[lisa@drdev1 ~]$ vim playbook5.yml
- name: Web page fetcher
hosts: hqdev1.tekneed.com
tasks:
- name: connect to website
uri:
url: https://tekneed.com
return_content: true
register: output
- name: verify content
fail:
msg: "verifying content"
when:
- '"this content" not in output.content'
- '"other content" not in output.content'
Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook5.yml
PLAY [Web page fetcher] ************************************************************
.......
Managing Changed Status
Managing a changed status can be useful in avoiding unexpected results while running a playbook. Some tasks may even report a changed status and nothing in the real sense has really changed. It could just be that information was retrieved.
In some cases, you may not want a task to result in a changed status. In this case, one needs to use the “changed_when: false” keyword. The playbook will only report “ok” or “failed” status, and will never report a changed status.
An example of such task can be seen below.
- name: copy in nginx conf
template: src=nginx.conf.j2
dest=/etc/nginx/nginx.conf
- name: validate nginx conf
command: nginx -t
changed_when: false
One can also specify a condition when a task should change, an example of such task is seen in the playbook below.
- name: Web page fetcher
hosts: hqdev1.tekneed.com
tasks:
- name: connect to website
uri:
url: https://tekneed.com
return_content: true
register: output
- name: verify content
fail:
msg: "verifying content"
changed_when: "'success' in output.stdout"
Using Ansible Blocks
Blocks are used to group tasks, specific tasks that are related, and can be very useful with a conditional statement.
If tasks are grouped conditionally, and the conditions is/are true, all the tasks will be executed. You should also know that block is a directive in Ansible and not a module, hence the block directive and the when directive will be at the same indentation level.
Let’s see how blocks can be used with examples.
create a playbook
[lisa@drdev1 ~]$ vim playbook8.yml
- name: setting up httpd
hosts: localhost
tasks:
- name: Install start and enable httpd
block:
- name: install httpd
yum:
name: httpd
state: present
- name: start and enable httpd
service:
name: httpd
state: started
enabled: true
when: ansible_distribution == "Red Hat"
This playbook will group only two tasks in a block(install and enable httpd). The first task name is “install httpd” while the second task name is “start and enable httpd“. These tasks will only execute if the condition is true, which is, the OS ansible will execute against is/are Red Hat.
Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook8.yml
PLAY [setting up httpd] ********************************************************
.......
With this kind of playbook, if the condition fails, other tasks will not be executed. Let’s see how we can use block with rescue and always if we don’t want this type of condition.
Using Ansible block With rescue and always Statement
Apart from blocks being used to group different tasks, they can also be used specifically for error handling with the rescue keyword.
It works in a way that; if a task that is defined in a block fails, the tasks defined in the rescue section will be executed. This is also similar to the ignore_errors keyword.
More so, there is also an always section, this section will always run either the task fails or not. This is also similar to the ignore_errors keyword.
Let’s understand better with examples.
create a playbook.
[lisa@drdev1 ~]$ vim playbook9.yml
- name: setting up httpd
hosts: localhost
tasks:
- name: Install the latest httpd and restart
block:
- name: install httpd
yum:
name: htt
state: latest
rescue:
- name: restart httpd
service:
name: httpd
state: started
- name: Install autofs
yum:
name: autofs
always:
- name: restart autofs
service:
name: autofs
state: started
This playbook will group four tasks in a block(install the latest httpd and restart).
The first task in the block section will fail because the name of the package is incorrect. However, the second and third tasks will run because they are in the rescue section. More so, the fourth task will run because it is in the always section.
Note that you can have as many tasks you want in the block section, rescue section or always section
Run the playbook
[lisa@drdev1 ~]$ ansible-playbook playbook9.yml
PLAY [setting up httpd] *********************************************************
......
Class Activity
create a playbook that contains 1 play with tasks using block, always and rescue statements
If you like this article, you can support us by
1. sharing this article.
2. Buying the article writer a coffee (click here to buy a coffee)
3. Donating to push our project to the next level. (click here to donate)
If you need personal training, send an email to info@tekneed.com
Click To Watch Video On Ansible Error Handling
RHCE EX294 Exam Practice Question On Ansible Error Handling
Suggested: Managing Layered Storage With Stratis – Video
Your feedback is welcomed. If you love others, you will share with others
Topics
-
Error Handling In Playbooks
-
Ignoring Failed Commands
-
Resetting Unreachable Hosts
-
Handlers and Failure
-
Controlling What Defines Failure
-
Overriding The Changed Result
-
Aborting the play
-
Using blocks
-
Ansible normally has defaults that make sure to check the return codes of commands and modules and
it fails fast – forcing an error to be dealt with unless you decide otherwise.
Sometimes a command that returns different than 0 isn’t an error. Sometimes a command might not always
need to report that it ‘changed’ the remote system. This section describes how to change
the default behavior of Ansible for certain tasks so output and error handling behavior is
as desired.
Ignoring Failed Commands¶
Generally playbooks will stop executing any more steps on a host that has a task fail.
Sometimes, though, you want to continue on. To do so, write a task that looks like this:
- name: this will not be counted as a failure command: /bin/false ignore_errors: yes
Note that the above system only governs the return value of failure of the particular task,
so if you have an undefined variable used or a syntax error, it will still raise an error that users will need to address.
Note that this will not prevent failures on connection or execution issues.
This feature only works when the task must be able to run and return a value of ‘failed’.
Resetting Unreachable Hosts¶
New in version 2.2.
Connection failures set hosts as ‘UNREACHABLE’, which will remove them from the list of active hosts for the run.
To recover from these issues you can use meta: clear_host_errors to have all currently flagged hosts reactivated,
so subsequent tasks can try to use them again.
Handlers and Failure¶
When a task fails on a host, handlers which were previously notified
will not be run on that host. This can lead to cases where an unrelated failure
can leave a host in an unexpected state. For example, a task could update
a configuration file and notify a handler to restart some service. If a
task later on in the same play fails, the service will not be restarted despite
the configuration change.
You can change this behavior with the --force-handlers
command-line option,
or by including force_handlers: True
in a play, or force_handlers = True
in ansible.cfg. When handlers are forced, they will run when notified even
if a task fails on that host. (Note that certain errors could still prevent
the handler from running, such as a host becoming unreachable.)
Controlling What Defines Failure¶
Ansible lets you define what “failure” means in each task using the failed_when
conditional. As with all conditionals in Ansible, lists of multiple failed_when
conditions are joined with an implicit and
, meaning the task only fails when all conditions are met. If you want to trigger a failure when any of the conditions is met, you must define the conditions in a string with an explicit or
operator.
You may check for failure by searching for a word or phrase in the output of a command:
- name: Fail task when the command error output prints FAILED command: /usr/bin/example-command -x -y -z register: command_result failed_when: "'FAILED' in command_result.stderr"
or based on the return code:
- name: Fail task when both files are identical raw: diff foo/file1 bar/file2 register: diff_cmd failed_when: diff_cmd.rc == 0 or diff_cmd.rc >= 2
In previous version of Ansible, this can still be accomplished as follows:
- name: this command prints FAILED when it fails command: /usr/bin/example-command -x -y -z register: command_result ignore_errors: True - name: fail the play if the previous command did not succeed fail: msg: "the command failed" when: "'FAILED' in command_result.stderr"
You can also combine multiple conditions for failure. This task will fail if both conditions are true:
- name: Check if a file exists in temp and fail task if it does command: ls /tmp/this_should_not_be_here register: result failed_when: - result.rc == 0 - '"No such" not in result.stdout'
If you want the task to fail when only one condition is satisfied, change the failed_when
definition to:
failed_when: result.rc == 0 or "No such" not in result.stdout
If you have too many conditions to fit neatly into one line, you can split it into a multi-line yaml value with >
:
- name: example of many failed_when conditions with OR shell: "./myBinary" register: ret failed_when: > ("No such file or directory" in ret.stdout) or (ret.stderr != '') or (ret.rc == 10)
Overriding The Changed Result¶
When a shell/command or other module runs it will typically report
“changed” status based on whether it thinks it affected machine state.
Sometimes you will know, based on the return code
or output that it did not make any changes, and wish to override
the “changed” result such that it does not appear in report output or
does not cause handlers to fire:
tasks: - shell: /usr/bin/billybass --mode="take me to the river" register: bass_result changed_when: "bass_result.rc != 2" # this will never report 'changed' status - shell: wall 'beep' changed_when: False
You can also combine multiple conditions to override “changed” result:
- command: /bin/fake_command register: result ignore_errors: True changed_when: - '"ERROR" in result.stderr' - result.rc == 2
Aborting the play¶
Sometimes it’s desirable to abort the entire play on failure, not just skip remaining tasks for a host.
The any_errors_fatal
option will end the play and prevent any subsequent plays from running. When an error is encountered, all hosts in the current batch are given the opportunity to finish the fatal task and then the execution of the play stops. any_errors_fatal
can be set at the play or block level:
- hosts: somehosts any_errors_fatal: true roles: - myrole - hosts: somehosts tasks: - block: - include_tasks: mytasks.yml any_errors_fatal: true
for finer-grained control max_fail_percentage
can be used to abort the run after a given percentage of hosts has failed.
Using blocks¶
Most of what you can apply to a single task (with the exception of loops) can be applied at the Blocks level, which also makes it much easier to set data or directives common to the tasks.
Blocks also introduce the ability to handle errors in a way similar to exceptions in most programming languages.
Blocks only deal with ‘failed’ status of a task. A bad task definition or an unreachable host are not ‘rescuable’ errors:
tasks: - name: Handle the error block: - debug: msg: 'I execute normally' - name: i force a failure command: /bin/false - debug: msg: 'I never execute, due to the above task failing, :-(' rescue: - debug: msg: 'I caught an error, can do stuff here to fix it, :-)'
This will ‘revert’ the failed status of the outer block
task for the run and the play will continue as if it had succeeded.
See Blocks error handling for more examples.
Reading Time: 3 minutes
Hello readers, in this blog we will be looking at how to handle errors in Ansible Playbooks. There are multiple ways for doing the same and we will be looking at them and how to use it in our Playbook.
By default, Ansible will check the return codes of commands and modules and it fails fast. This means that we will be forced to deal with these failures by default until we decide otherwise.
Let us start by looking how to change the default behaviour of Ansible for certain tasks so that error handling behaviour is as per our requirements.
How to Ignore Failed Commands?
Ansible playbooks stop the execution of any more tasks on a host which has encountered any failures. But in some cases, even after a failure, we might want to continue executing tasks on that host. So, we will have to write tasks that look like the one below:
- name: some task
command: /bin/false
ignore_errors: yes
Above all, we will have to keep in mind that this feature will only work when the task is able to run and return a value associated with failure. So, if we have any undefined variables or syntax errors, we will still get an error which we will have to address. Also, it will not prevent connection or execution issues.
How to Reset Unreachable Hosts?
Whenever an Ansible Playbook encounters connection failure with a host, it sets the host as ‘UNREACHABLE’. By doing this, Ansible removes this host from the list of active hosts for the run. To reset this list, we can use meta:clear_host_errors to reactivate all the hosts associated with play. This makes the tasks can try to use them again. We can use this in the same way as below:
- hosts: all tasks: - set_fact: was_accessible: "up" - meta: clear_host_errors - debug: msg: "Hello" - when: - was_accessible is defined debug: msg: "Hello again, I am up."
Running Handlers Despite Failures
Handlers associated with a task will not run on hosts on which the task has failed. As a result, a host is left in an unexpected state even though the failures are unrelated.
To tackle this problem, we can use the following options:
1. Using –force-handlers command line option
2. Including force_handlers: True in a play
3. Setting force_handlers=True in ansible.cfg configuration file.
- hosts: all
force_handlers: true
When we force handlers to run, the handlers will run when notified even if a task has failed on the host.
How to Define Failures?
Ansible provides failed_when conditional to allow us to define what “failure” means. Multiple failed_when can be joined using and
that requires that a task is marked as failed only when all the failed_when conditions are met. To register a failure when any one of our multiple conditions are met, we can use or
operator.
- name: Web page fetcher
hosts: all
tasks:
- name: Fetch webpage
uri:
url: https://somewebsite.com
return_content: true
register: output
- name: Check Content
debug:
msg: "Checking content..."
failed_when:
- '"Some Content" not in output.content'
- '"Some other content" not in output.content'
failed_when: output.number == 0 or "No such" not in output.stdout
How to Abort a Play?
When there are failures in a play, sometimes it is essential to abort the entire play instead of just skipping a task. In this scenario, we will have to use the any_errors_fatal option. This option will prevent the play and any subsequent plays from running. In the case of a failure, hosts situated in the current batch are given the opportunity to finish the fatal task and after that the execution of the play is stopped.
We can use this option in the way given below:
- hosts: somehosts
any_errors_fatal: true
Conclusion
We have seen throughout this blog that there are multiple ways to handle errors in ansible playbooks. We looked that we can also define what “failure” means in our playbooks and what are the various actions we can perform when we encounter them!
References
https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html