Return code is not 0 как исправить - Исправление ошибок и поиск оптимальных решений проблем

Stuck with Non-Zero return code: Ansible error? We can help you.

Ansible will fail if the exit status of a task is any non-zero value.

As part of our Server Management Services, we assist our customers with several Ansible queries.

Today, let us see how we can fix this error.

Non-Zero return code: Ansible

Generally, the error looks like this:

TASK [Non-Zero return] 
**********************************************************************************
fatal: [server1.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.021103”, “end”: “2021-06-29 12:53:49.222176”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 12:53:49.201073”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}
fatal: [server2.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.021412”, “end”: “2021-06-29 12:53:50.697567”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 12:53:50.676155”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}
fatal: [server3.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.015554”, “end”: “2021-06-29 12:53:50.075555”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 12:53:50.060001”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}

In common, if a command exits with a zero exit status it means it has run successfully.

On the other hand, any non-zero exit status of the command indicates an error.

For example,

$ date
Tuesday 29 June 2021 05:21:28 PM IST
$ echo $?
0

Here, we can see the successful execution of the shell command “date”. Hence, the exit status of the command is 0.

A non-zero exit status indicates failure. For example,

$ date yesterday
date: invalid date ‘yesterday’
$ echo $?
1

Here, the argument for the ‘date’ command, “yesterday”, is invalid. Hence, the exit status is 1, indicating the command ended in error.

However, though we execute properly, there are some commands which return a non-zero value.

$ ls | grep wp-config.php
$ echo $?
1

Here, the wp-config.php file doesn’t exist in that directory. Even though the command executes without error, the exit status is 1.

By default, Ansible will report it as failed.

How to resolve the problem?

The best practice in order to solve this is to avoid the usage of shell command in the playbook.

Instead of the shell command, there is a high chance for an ansible module that does the same operation.

So, we can use the ansible built-in module find which allows locating files easily through ansible.

Alternatively, we can define the condition for a failure at the task level with the help of failed_when.

For example,

—
– hosts: all
tasks:
– name: Non-Zero return
shell: “ls | grep wp-config.php”
register: wp
failed_when: “wp.rc not in [ 0, 1 ]”

TASK [Non-Zero return]
***********************************************************************************************************
changed: [server1.lab.com]
changed: [server2.lab.com]
changed: [server3.lab.com]

Though the exit status is not Zero, the task continues to execute on the server.

Here, the exit status registers to a variable and then pass through the condition. If the return value doesn’t match the condition, only then the task will report as a failure.

On the other hand, we can ignore the errors altogether.

For that, we use ignore_errors in the task to ignore any failure during the task.

—
– hosts: all
tasks:
– name: Non-Zero return
shell: “ls | grep wp-config.php”
ignore_errors: true

ASK [Non-Zero return]
***********************************************************************************************************
fatal: [server1.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.004055”, “end”: “2021-06-29 13:09:20.631570”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 13:09:20.627515”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}
…ignoring
fatal: [server2.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.006745”, “end”: “2021-06-29 13:09:22.110059”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 13:09:22.103314”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}
…ignoring
fatal: [server3.lab.com]: FAILED! => {“changed”: true, “cmd”: “ls | grep wp-config.php”, “delta”: “0:00:00.004957”, “end”: “2021-06-29 13:09:21.465326”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2021-06-29 13:09:21.460369”, “stderr”: “”, “stderr_lines”: [], “stdout”: “”, “stdout_lines”: []}
…ignoring

By default, for ansible to recognize the task complition, the exit status must be Zero. Otherwise, it will fail.

We can manipulate the exit status of the task by registering the return value to a variable and then use conditional to determine if the task fails or succeeds.

To continue the playbook, in spite of the failure, we can use the ignore_errors option on the task.

[Confused with the procedure? We are here for you]

Conclusion

In short, we saw how our Support Techs fix the Ansible error for our customers.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

var google_conversion_label = «owonCMyG5nEQ0aD71QM»;

Источник

Ansible doesn’t seem to be able to handle the result ‘0’ for shell commands. This

- name: Check if swap exists
  shell: "swapon -s | grep -ci dev"
  register: swap_exists

Returns an error

«msg»: «non-zero return code»

But when I replace «dev» with «type», which actually always occurs and gives a count of at least 1, then the command is successful and no error is thrown.

I also tried with command: instead of shell: — it doesn’t give an error, but then the command is also not executed.

asked May 21, 2018 at 0:15

since you want to run a sequence of commands that involve pipe, ansible states you should use shell and not command, as you are doing.

So, the problem is the fact that grep returns 1 (didnt find a match on the swapon output), and ansible considers this a failure. Since you are well sure there is no issue, just add a ignore_errors: true and be done with it.

- name: Check if swap exists
  shell: "swapon -s | grep -ci non_existent_string"
  register: swap_exists
  ignore_errors: true

OR:

if you want to narrow it down to return codes 0 and 1, instruct ansible to not consider failures those 2 rcs:

- name: Check if swap exists
  shell: "swapon -s | grep -ci non_existent_string"
  register: swap_exists
  # ignore_errors: true
  failed_when: swap_exists.rc != 1 and swap_exists.rc != 0

answered May 21, 2018 at 0:47

ilias-spilias-sp

5,8093 gold badges26 silver badges40 bronze badges

I found a better way. if you only need to know the record number this works:

- name: Check if swap exists
  shell: "swapon -s | grep -i dev|wc -l"
  register: swap_exists

Another way is to always use cat at the end of the pipe. See Ansible shell module returns error when grep results are empty

    - name: Check if swap exists
      shell: "swapon -s | grep -i dev|cat"
      register: swap_exists

Calum Halpin

1,8381 gold badge10 silver badges20 bronze badges

answered Feb 23, 2020 at 9:20

robinrobin

212 bronze badges

You can also parse the grep count result in awk and return your custom output. This will avoid the ignore_errors module.

- name: Check if swap exists
  shell: "swapon -s | grep -ci dev" | awk '{ r = $0 == 0 ? "false":"true"; print r }'
  register: swap_exists

answered Jun 17, 2020 at 5:15

Aravinthan KAravinthan K

1,7052 gold badges19 silver badges22 bronze badges

Источник

Обработка ошибок в плейбуках

Когда Ansible получает ненулевой код возврата от команды или сбоя от модуля,по умолчанию он прекращает выполнение на этом хосте и продолжается на других хостах.Тем не менее,в некоторых случаях вам может потребоваться иное поведение.Иногда ненулевой код возврата указывает на успех.Иногда вы хотите,чтобы сбой на одном хосте остановил выполнение на всех хостах.Ansible предоставляет инструменты и настройки,чтобы справиться с этими ситуациями и помочь вам получить поведение,вывод и отчетность вы хотите.

Игнорирование неудачных команд
игнорирование недоступных ошибок хоста
Сброс недоступных хостов
Дескрипторы и отказ
Defining failure
Defining “changed”
Обеспечение успеха для командования и снаряда
Прерывание игры на всех хозяевах
- Прерывание первой ошибки:any_errors_fatal
- Установка максимального процента отказа
Ошибки управления блоками

Игнорирование неудачных команд

По умолчанию Ansible прекращает выполнение задач на хосте при сбое задачи на этом хосте. Вы можете использовать ignore_errors , чтобы продолжить несмотря на сбой:

- name: Do not count this as a failure
  ansible.builtin.command: /bin/false
  ignore_errors: yes

Директива ignore_errors работает только тогда, когда задача может быть запущена и возвращает значение «сбой». Это не заставляет Ansible игнорировать ошибки неопределенных переменных, сбои соединения, проблемы с выполнением (например, отсутствующие пакеты) или синтаксические ошибки.

игнорирование недоступных ошибок хоста

Новинка в версии 2.7.

Вы можете игнорировать сбой задачи из-за того, что экземпляр хоста недоступен с ключевым словом ignore_unreachable . Ansible игнорирует ошибки задачи, но продолжает выполнять будущие задачи на недостижимом хосте. Например, на уровне задачи:

- name: This executes, fails, and the failure is ignored
  ansible.builtin.command: /bin/true
  ignore_unreachable: yes

- name: This executes, fails, and ends the play for this host
  ansible.builtin.command: /bin/true

И на игровом уровне:

- hosts: all
  ignore_unreachable: yes
  tasks:
  - name: This executes, fails, and the failure is ignored
    ansible.builtin.command: /bin/true

  - name: This executes, fails, and ends the play for this host
    ansible.builtin.command: /bin/true
    ignore_unreachable: no

Сброс недоступных хостов

Если Ansible не может подключиться к хосту, он помечает этот хост как «НЕДОСТУПНЫЙ» и удаляет его из списка активных хостов для выполнения. Вы можете использовать meta: clear_host_errors для повторной активации всех хостов, чтобы последующие задачи могли снова попытаться связаться с ними.

Дескрипторы и отказ

Ansible runs handlers at the end of each play. If a task notifies a handler but another task fails later in the play, by default the handler does not run on that host, which may leave the host in an unexpected state. For example, a task could update a configuration file and notify a handler to restart some service. If a task later in the same play fails, the configuration file might be changed but the service will not be restarted.

Вы можете изменить это поведение с --force-handlers опций командной строки, в том числе путем force_handlers: True в пьесе, или путем добавления force_handlers = True в ansible.cfg. Когда обработчики принудительно запущены, Ansible будет запускать все обработчики уведомлений на всех хостах, даже на хостах с неудачными задачами. (Обратите внимание, что некоторые ошибки все еще могут помешать запуску обработчика, например, когда хост становится недоступным.)

Defining failure

Ansible позволяет определить, что означает «сбой» в каждой задаче, используя условие failed_when . Как и все условные операторы в Ansible, списки нескольких условий failed_when объединяются неявным оператором and , что означает, что задача завершается сбоем только при соблюдении всех условий. Если вы хотите инициировать сбой при выполнении любого из условий, вы должны определить условия в строке с явным оператором or .

Проверить на неудачу можно с помощью поиска слова или фразы в выводе команды:

- name: Fail task when the command error output prints FAILED
  ansible.builtin.command: /usr/bin/example-command -x -y -z
  register: command_result
  failed_when: "'FAILED' in command_result.stderr"

или на основании кода возврата:

- name: Fail task when both files are identical
  ansible.builtin.raw: diff foo/file1 bar/file2
  register: diff_cmd
  failed_when: diff_cmd.rc == 0 or diff_cmd.rc >= 2

Вы также можете комбинировать несколько условий для отказа.Эта задача будет неудачной,если оба условия верны:

- name: Check if a file exists in temp and fail task if it does
  ansible.builtin.command: ls /tmp/this_should_not_be_here
  register: result
  failed_when:
    - result.rc == 0
    - '"No such" not in result.stdout'

Если вы хотите, чтобы задача не выполнялась при выполнении только одного условия, измените определение failed_when на:

failed_when: result.rc == 0 or "No such" not in result.stdout

Если у вас слишком много условий для аккуратного размещения в одной строке, вы можете разделить его на многострочное значение yaml с помощью > :

- name: example of many failed_when conditions with OR
  ansible.builtin.shell: "./myBinary"
  register: ret
  failed_when: >
    ("No such file or directory" in ret.stdout) or
    (ret.stderr != '') or
    (ret.rc == 10)

Defining “changed”

Ansible позволяет вам определить, когда конкретная задача «изменила» удаленный узел, используя условное changed_when . Это позволяет вам определить, на основе кодов возврата или вывода, следует ли сообщать об изменении в статистике Ansible и должен ли запускаться обработчик или нет. Как и все условные операторы в Ansible, списки нескольких условий changed_when объединяются неявным оператором and , что означает, что задача сообщает об изменении только тогда, когда все условия соблюдены. Если вы хотите сообщить об изменении при выполнении любого из условий, вы должны определить условия в строке с явным оператором or .Например:

tasks:

  - name: Report 'changed' when the return code is not equal to 2
    ansible.builtin.shell: /usr/bin/billybass --mode="take me to the river"
    register: bass_result
    changed_when: "bass_result.rc != 2"

  - name: This will never report 'changed' status
    ansible.builtin.shell: wall 'beep'
    changed_when: False

Вы также можете объединить несколько условий,чтобы отменить результат «изменено»:

- name: Combine multiple conditions to override 'changed' result
  ansible.builtin.command: /bin/fake_command
  register: result
  ignore_errors: True
  changed_when:
    - '"ERROR" in result.stderr'
    - result.rc == 2

Дополнительные примеры условного синтаксиса см. В разделе Определение ошибки .

Обеспечение успеха для командования и снаряда

В командных и оболочки модулей заботятся о кодах возврата, поэтому если у вас есть команда , чей успешный код завершения не равен нулю, то вы можете сделать это:

tasks:
  - name: Run this command and ignore the result
    ansible.builtin.shell: /usr/bin/somecommand || /bin/true

Прерывание игры на всех хозяевах

Иногда требуется, чтобы сбой на одном хосте или сбой на определенном проценте хостов прервали всю игру на всех хостах. Вы можете остановить выполнение воспроизведения после первого сбоя с помощью any_errors_fatal . Для более max_fail_percentage управления вы можете использовать max_fail_percentage, чтобы прервать выполнение после сбоя определенного процента хостов.

Прерывание первой ошибки:any_errors_fatal

Если вы устанавливаете any_errors_fatal и задача возвращает ошибку, Ansible завершает фатальную задачу на всех хостах в текущем пакете, а затем прекращает воспроизведение на всех хостах. Последующие задания и спектакли не выполняются. Вы можете избавиться от фатальных ошибок, добавив в блок раздел восстановления. Вы можете установить any_errors_fatal на уровне игры или блока:

- hosts: somehosts
  any_errors_fatal: true
  roles:
    - myrole

- hosts: somehosts
  tasks:
    - block:
        - include_tasks: mytasks.yml
      any_errors_fatal: true

Вы можете использовать эту функцию,когда все задачи должны быть на 100% успешными,чтобы продолжить выполнение Playbook.Например,если вы запускаете сервис на машинах в нескольких центрах обработки данных с балансировщиками нагрузки для передачи трафика от пользователей к сервису,вы хотите,чтобы все балансировщики нагрузки были отключены до того,как вы остановите сервис на техническое обслуживание.Чтобы гарантировать,что любой сбой в задаче,отключающей работу балансировщиков нагрузки,остановит все остальные задачи:

---
- hosts: load_balancers_dc_a
  any_errors_fatal: true

  tasks:
    - name: Shut down datacenter 'A'
      ansible.builtin.command: /usr/bin/disable-dc

- hosts: frontends_dc_a

  tasks:
    - name: Stop service
      ansible.builtin.command: /usr/bin/stop-software

    - name: Update software
      ansible.builtin.command: /usr/bin/upgrade-software

- hosts: load_balancers_dc_a

  tasks:
    - name: Start datacenter 'A'
      ansible.builtin.command: /usr/bin/enable-dc

В данном примере Ansible запускает обновление программного обеспечения на передних концах только в том случае,если все балансировщики нагрузки успешно отключены.

Установка максимального процента отказа

По умолчанию,Ansible продолжает выполнять задачи до тех пор,пока есть хосты,которые еще не вышли из строя.В некоторых ситуациях,например,при выполнении скользящего обновления,вы можете прервать воспроизведение,когда достигнут определенный порог неудач.Для этого вы можете установить максимальный процент сбоев при воспроизведении:

---
- hosts: webservers
  max_fail_percentage: 30
  serial: 10

Параметр max_fail_percentage применяется к каждому пакету, когда вы используете его с последовательным интерфейсом . В приведенном выше примере, если более 3 из 10 серверов в первой (или любой) группе серверов вышли из строя, остальная часть игры будет прервана.

Note

Установленный процент должен быть превышен,а не равен.Например,если серийный набор установлен на 4 и вы хотите,чтобы задача прерывала воспроизведение при сбое 2-х систем,установите max_fail_percentage на 49,а не на 50.

Ошибки управления блоками

Вы также можете использовать блоки для определения ответов на ошибки задачи. Этот подход похож на обработку исключений во многих языках программирования. См. Подробности и примеры в разделе Обработка ошибок с помощью блоков .

Ansible

Контроль над тем,где выполняются задачи:делегирование и местные действия.

По умолчанию Ansible собирает факты и выполняет все задачи на машинах, которые соответствуют строке hosts из вашего playbook.
Настройка удаленной среды

Новое в версии 1.1.
Использование фильтров для манипулирования данными

Фильтры позволяют преобразовывать данные JSON в разделенный URL-адрес YAML, извлекать имя хоста, получать хэш строки SHA1, добавлять несколько целых чисел и многое другое.
Объединение и выбор данных

Вы можете комбинировать данные из нескольких источников и типов, выбирать значения больших структур, предоставляя точный контроль над комплексом Новое в версии 2.3.

Источник

I have a framework written in python, and for testing purposes I basically want to do a subprocess (aka shell call) … that should simply come back with a RC != 0. I tried to invoke some non-existing executable; or to run «exit 1»; but those are for some reason translated to a FileNotFoundError.

So, what else could I do to trigger a return code != 0 (in a «reliable» way; meaning the command should not suddenly return 0 at a future point in time).

I thought to «search» for a binary called exit, but well:

> /usr/bin/env exit
/usr/bin/env: exit: No such file or directory

asked Apr 10, 2015 at 13:07

If you’re looking for a system command that always returns a non-zero exit code, then /bin/false seems like it should work for you. From man false:

NAME
       false - do nothing, unsuccessfully

SYNOPSIS
       false [ignored command line arguments]
       false OPTION

DESCRIPTION
       Exit with a status code indicating failure.

answered Apr 10, 2015 at 13:58

steeldriversteeldriver

127k21 gold badges226 silver badges312 bronze badges

You can create a new return code with the command bash -c "exit RETURNCODE", replacing «RETURNCODE» with any number. Note that it will be trimmed to an 8bit unsigned integer (0…255) by (RETURNCODE mod 256)

You can check the return code of the last shell command inside the terminal(!) with executing echo $?. The «$?» variable contains the most recent return code and «echo» prints it to the standard output.

answered Apr 10, 2015 at 13:42

Byte Commander♦Byte Commander

103k43 gold badges277 silver badges418 bronze badges

After some more testing, I found that my problem was not on the «Linux» side.

Python has a module shlex; which should be used to «split» command strings. When I changed my subprocess call to use the output of shlex.split() invoking «bash exit 1» gives me what I need.

answered Apr 10, 2015 at 13:25

GhostCatGhostCat

2,0557 gold badges27 silver badges40 bronze badges

Источник

When Ansible receives a non-zero return code from a command or a failure from a module, by default it stops executing on that host and continues on other hosts. However, in some circumstances you may want different behavior. Sometimes a non-zero return code indicates success. Sometimes you want a failure on one host to stop execution on all hosts. Ansible provides tools and settings to handle these situations and help you get the behavior, output, and reporting you want.

Ignoring failed commands

By default Ansible stops executing tasks on a host when a task fails on that host. You can use ignore_errors to continue on in spite of the failure:

— name: Do not count this as a failure
ansible.builtin.command: /bin/false
ignore_errors: yes

The ignore_errors directive only works when the task is able to run and returns a value of ‘failed’. It does not make Ansible ignore undefined variable errors, connection failures, execution issues (for example, missing packages), or syntax errors.

Ignoring unreachable host errors

You can ignore a task failure due to the host instance being ‘UNREACHABLE’ with the ignore_unreachable keyword. Ansible ignores the task errors, but continues to execute future tasks against the unreachable host. For example, at the task level:

— name: This executes, fails, and the failure is ignored
ansible.builtin.command: /bin/true
ignore_unreachable: yes

— name: This executes, fails, and ends the play for this host
ansible.builtin.command: /bin/true

And at the playbook level:

— hosts: all
ignore_unreachable: yes
tasks:
— name: This executes, fails, and the failure is ignored
ansible.builtin.command: /bin/true

— name: This executes, fails, and ends the play for this host
ansible.builtin.command: /bin/true
ignore_unreachable: no

Resetting unreachable hosts

If Ansible cannot connect to a host, it marks that host as ‘UNREACHABLE’ and removes it from the list of active hosts for the run. You can use meta: clear_host_errors to reactivate all hosts, so subsequent tasks can try to reach them again.

Ansible runs handlers at the end of each play. If a task notifies a handler but another task fails later in the play, by default the handler does not run on that host, which may leave the host in an unexpected state. For example, a task could update a configuration file and notify a handler to restart some service. If a task later in the same play fails, the configuration file might be changed but the service will not be restarted.
You can change this behavior with the —force-handlers command-line option, by including force_handlers: True in a play, or by adding force_handlers = True to ansible.cfg. When handlers are forced, Ansible will run all notified handlers on all hosts, even hosts with failed tasks. (Note that certain errors could still prevent the handler from running, such as a host becoming unreachable.)

Ansible lets you define what “failure” means in each task using the failed_when conditional. As with all conditionals in Ansible, lists of multiple failed_when conditions are joined with an implicit and, meaning the task only fails when all conditions are met. If you want to trigger a failure when any of the conditions is met, you must define the conditions in a string with an explicit or operator.

You may check for failure by searching for a word or phrase in the output of a command:

— name: Fail task when the command error output prints FAILED
ansible.builtin.command: /usr/bin/example-command -x -y -z
register: command_result
failed_when: «‘FAILED’ in command_result.stderr»

or based on the return code:

— name: Fail task when both files are identical
ansible.builtin.raw: diff foo/file1 bar/file2
register: diff_cmd
failed_when: diff_cmd.rc == 0 or diff_cmd.rc >= 2

You can also combine multiple conditions for failure. This task will fail if both conditions are true:

— name: Check if a file exists in temp and fail task if it does
ansible.builtin.command: ls /tmp/this_should_not_be_here
register: result
failed_when:
— result.rc == 0
— ‘»No such» not in result.stdout’

If you want the task to fail when only one condition is satisfied, change the failed_when definition to:
failed_when: result.rc == 0 or «No such» not in result.stdout

If you have too many conditions to fit neatly into one line, you can split it into a multi-line yaml value with >:

— name: example of many failed_when conditions with OR
ansible.builtin.shell: «./myBinary»
register: ret
failed_when: >
(«No such file or directory» in ret.stdout) or
(ret.stderr != ») or
(ret.rc == 10)

Ansible lets you define when a particular task has “changed” a remote node using the changed_when conditional. This lets you determine, based on return codes or output, whether a change should be reported in Ansible statistics and whether a handler should be triggered or not. As with all conditionals in Ansible, lists of multiple changed_when conditions are joined with an implicit and, meaning the task only reports a change when all conditions are met. If you want to report a change when any of the conditions is met, you must define the conditions in a string with an explicit or operator. For example:

— name: Report ‘changed’ when the return code is not equal to 2
ansible.builtin.shell: /usr/bin/billybass —mode=»take me to the river»
register: bass_result
changed_when: «bass_result.rc != 2»

— name: This will never report ‘changed’ status
ansible.builtin.shell: wall ‘beep’
changed_when: False

You can also combine multiple conditions to override “changed” result:

— name: Combine multiple conditions to override ‘changed’ result
ansible.builtin.command: /bin/fake_command
register: result
ignore_errors: True
changed_when:
— ‘»ERROR» in result.stderr’
— result.rc == 2

Ensuring success for command and shell

The command and shell modules care about return codes, so if you have a command whose successful exit code is not zero, you can do this:

tasks:
— name: Run this command and ignore the result
ansible.builtin.shell: /usr/bin/somecommand || /bin/true

Aborting a play on all hosts

Sometimes you want a failure on a single host, or failures on a certain percentage of hosts, to abort the entire play on all hosts. You can stop play execution after the first failure happens with any_errors_fatal. For finer-grained control, you can use max_fail_percentage to abort the run after a given percentage of hosts has failed.

Aborting on the first error: any_errors_fatal

If you set any_errors_fatal and a task returns an error, Ansible finishes the fatal task on all hosts in the current batch, then stops executing the play on all hosts. Subsequent tasks and plays are not executed. You can recover from fatal errors by adding a rescue section to the block. You can set any_errors_fatal at the play or block level:

— hosts: somehosts
any_errors_fatal: true
roles:
— myrole
— hosts: somehosts
tasks:
— block:
— include_tasks: mytasks.yml
any_errors_fatal: true

You can use this feature when all tasks must be 100% successful to continue playbook execution. For example, if you run a service on machines in multiple data centers with load balancers to pass traffic from users to the service, you want all load balancers to be disabled before you stop the service for maintenance. To ensure that any failure in the task that disables the load balancers will stop all other tasks:

—
— hosts: load_balancers_dc_a
any_errors_fatal: true
tasks:
— name: Shut down datacenter ‘A’
ansible.builtin.command: /usr/bin/disable-dc
— hosts: frontends_dc_a
tasks:
— name: Stop service
ansible.builtin.command: /usr/bin/stop-software
— name: Update software
ansible.builtin.command: /usr/bin/upgrade-software
— hosts: load_balancers_dc_a
tasks:
— name: Start datacenter ‘A’
ansible.builtin.command: /usr/bin/enable-dc

In this example Ansible starts the software upgrade on the front ends only if all of the load balancers are successfully disabled.

Setting a maximum failure percentage

By default, Ansible continues to execute tasks as long as there are hosts that have not yet failed. In some situations, such as when executing a rolling update, you may want to abort the play when a certain threshold of failures has been reached. To achieve this, you can set a maximum failure percentage on a play:

—
— hosts: webservers
max_fail_percentage: 30
serial: 10

The max_fail_percentage setting applies to each batch when you use it with serial. In the example above, if more than 3 of the 10 servers in the first (or any) batch of servers failed, the rest of the play would be aborted.

Note
The percentage set must be exceeded, not equaled. For example, if serial were set to 4 and you wanted the task to abort the play when 2 of the systems failed, set the max_fail_percentage at 49 rather than 50.

Controlling errors in blocks

You can also use blocks to define responses to task errors. This approach is similar to exception handling in many programming languages.

Blocks create logical groups of tasks. Blocks also offer ways to handle task errors, similar to exception handling in many programming languages.

Grouping tasks with blocks
Handling errors with blocks

Grouping tasks with blocks

All tasks in a block inherit directives applied at the block level. Most of what you can apply to a single task (with the exception of loops) can be applied at the block level, so blocks make it much easier to set data or directives common to the tasks. The directive does not affect the block itself, it is only inherited by the tasks enclosed by a block. For example, a when statement is applied to the tasks within a block, not to the block itself.

Block example with named tasks inside the block

tasks:

— name: Install, configure, and start Apache

block:

— name: Install httpd and memcached

ansible.builtin.yum:

name:

— httpd

— memcached

state: present

— name: Apply the foo config template

ansible.builtin.template:

src: templates/src.j2

dest: /etc/foo.conf

— name: Start service bar and enable it

ansible.builtin.service:

name: bar

state: started

enabled: True

when: ansible_facts[‘distribution’] == ‘CentOS’

become: true

become_user: root

ignore_errors: yes

In the example above, the ‘when’ condition will be evaluated before Ansible runs each of the three tasks in the block. All three tasks also inherit the privilege escalation directives, running as the root user. Finally, ignore_errors: yes ensures that Ansible continues to execute the playbook even if some of the tasks fail.

Names for blocks have been available since Ansible 2.3. We recommend using names in all tasks, within blocks or elsewhere, for better visibility into the tasks being executed when you run the playbook.

Handling errors with blocks

You can control how Ansible responds to task errors using blocks with rescue and always sections.

Rescue blocks specify tasks to run when an earlier task in a block fails. This approach is similar to exception handling in many programming languages. Ansible only runs rescue blocks after a task returns a ‘failed’ state. Bad task definitions and unreachable hosts will not trigger the rescue block.

Block error handling example

tasks:

— name: Handle the error

block:

— name: Print a message

ansible.builtin.debug:

msg: ‘I execute normally’

— name: Force a failure

ansible.builtin.command: /bin/false

— name: Never print this

ansible.builtin.debug:

msg: ‘I never execute, due to the above task failing, :-(‘

rescue:

— name: Print when errors

ansible.builtin.debug:

msg: ‘I caught an error, can do stuff here to fix it, :-)’

You can also add an always section to a block. Tasks in the always section run no matter what the task status of the previous block is.

Block with always section

— name: Always do X

block:

— name: Print a message

ansible.builtin.debug:

msg: ‘I execute normally’

— name: Force a failure

ansible.builtin.command: /bin/false

— name: Never print this

ansible.builtin.debug:

msg: ‘I never execute :-(‘

always:

— name: Always do this

ansible.builtin.debug:

msg: «This always executes, :-)»

Together, these elements offer complex error handling.

Block with all sections

— name: Attempt and graceful roll back demo

block:

— name: Print a message

ansible.builtin.debug:

msg: ‘I execute normally’

— name: Force a failure

ansible.builtin.command: /bin/false

— name: Never print this

ansible.builtin.debug:

msg: ‘I never execute, due to the above task failing, :-(‘

rescue:

— name: Print when errors

ansible.builtin.debug:

msg: ‘I caught an error’

— name: Force a failure in middle of recovery! >:-)

ansible.builtin.command: /bin/false

— name: Never print this

ansible.builtin.debug:

msg: ‘I also never execute :-(‘

always:

— name: Always do this

ansible.builtin.debug:

msg: «This always executes»

The tasks in the block execute normally. If any tasks in the block return failed, the rescue section executes tasks to recover from the error. The always section runs regardless of the results of the block and rescue sections.

If an error occurs in the block and the rescue task succeeds, Ansible reverts the failed status of the original task for the run and continues to run the play as if the original task had succeeded. The rescued task is considered successful, and does not trigger max_fail_percentage or any_errors_fatal configurations. However, Ansible still reports a failure in the playbook statistics.

You can use blocks with flush_handlers in a rescue task to ensure that all handlers run even if an error occurs:

Block run handlers in error handling

tasks:

— name: Attempt and graceful roll back demo

block:

— name: Print a message

ansible.builtin.debug:

msg: ‘I execute normally’

changed_when: yes

notify: run me even after an error

— name: Force a failure

ansible.builtin.command: /bin/false

rescue:

— name: Make sure all handlers run

meta: flush_handlers

handlers:

— name: Run me even after an error

ansible.builtin.debug:

msg: ‘This handler runs even on error’

Ansible provides a couple of variables for tasks in the rescue portion of a block:

ansible_failed_task

The task that returned ‘failed’ and triggered the rescue. For example, to get the name use ansible_failed_task.name.

ansible_failed_result

The captured return result of the failed task that triggered the rescue. This would equate to having used this var in the register keyword.

Источник

Ignoring failed commands
Ignoring unreachable host errors
Resetting unreachable hosts
Handlers and failure
Defining failure
Defining “changed”
Ensuring success for command and shell
Aborting a play on all hosts
- Aborting on the first error: any_errors_fatal
- Setting a maximum failure percentage
Controlling errors in blocks

Ignoring failed commands

By default Ansible stops executing tasks on a host when a task fails on that host. You can use ignore_errors to continue on in spite of the failure.

- name: Do not count this as a failure
  ansible.builtin.command: /bin/false
  ignore_errors: true

The ignore_errors directive only works when the task is able to run and returns a value of ‘failed’. It does not make Ansible ignore undefined variable errors, connection failures, execution issues (for example, missing packages), or syntax errors.

Ignoring unreachable host errors

New in version 2.7.

You can ignore a task failure due to the host instance being ‘UNREACHABLE’ with the ignore_unreachable keyword. Ansible ignores the task errors, but continues to execute future tasks against the unreachable host. For example, at the task level:

- name: This executes, fails, and the failure is ignored
  ansible.builtin.command: /bin/true
  ignore_unreachable: true

- name: This executes, fails, and ends the play for this host
  ansible.builtin.command: /bin/true

And at the playbook level:

- hosts: all
  ignore_unreachable: true
  tasks:
  - name: This executes, fails, and the failure is ignored
    ansible.builtin.command: /bin/true

  - name: This executes, fails, and ends the play for this host
    ansible.builtin.command: /bin/true
    ignore_unreachable: false

Resetting unreachable hosts

Handlers and failure

Ansible runs handlers at the end of each play. If a task notifies a handler but
another task fails later in the play, by default the handler does not run on that host,
which may leave the host in an unexpected state. For example, a task could update
a configuration file and notify a handler to restart some service. If a
task later in the same play fails, the configuration file might be changed but
the service will not be restarted.

You can change this behavior with the --force-handlers command-line option,
by including force_handlers: True in a play, or by adding force_handlers = True
to ansible.cfg. When handlers are forced, Ansible will run all notified handlers on
all hosts, even hosts with failed tasks. (Note that certain errors could still prevent
the handler from running, such as a host becoming unreachable.)

Defining failure

Ansible lets you define what “failure” means in each task using the failed_when conditional. As with all conditionals in Ansible, lists of multiple failed_when conditions are joined with an implicit and, meaning the task only fails when all conditions are met. If you want to trigger a failure when any of the conditions is met, you must define the conditions in a string with an explicit or operator.

You may check for failure by searching for a word or phrase in the output of a command

- name: Fail task when the command error output prints FAILED
  ansible.builtin.command: /usr/bin/example-command -x -y -z
  register: command_result
  failed_when: "'FAILED' in command_result.stderr"

or based on the return code

- name: Fail task when both files are identical
  ansible.builtin.raw: diff foo/file1 bar/file2
  register: diff_cmd
  failed_when: diff_cmd.rc == 0 or diff_cmd.rc >= 2

You can also combine multiple conditions for failure. This task will fail if both conditions are true:

- name: Check if a file exists in temp and fail task if it does
  ansible.builtin.command: ls /tmp/this_should_not_be_here
  register: result
  failed_when:
    - result.rc == 0
    - '"No such" not in result.stdout'

If you want the task to fail when only one condition is satisfied, change the failed_when definition to

failed_when: result.rc == 0 or "No such" not in result.stdout

If you have too many conditions to fit neatly into one line, you can split it into a multi-line YAML value with >.

- name: example of many failed_when conditions with OR
  ansible.builtin.shell: "./myBinary"
  register: ret
  failed_when: >
    ("No such file or directory" in ret.stdout) or
    (ret.stderr != '') or
    (ret.rc == 10)

Defining “changed”

Ansible lets you define when a particular task has “changed” a remote node using the changed_when conditional. This lets you determine, based on return codes or output, whether a change should be reported in Ansible statistics and whether a handler should be triggered or not. As with all conditionals in Ansible, lists of multiple changed_when conditions are joined with an implicit and, meaning the task only reports a change when all conditions are met. If you want to report a change when any of the conditions is met, you must define the conditions in a string with an explicit or operator. For example:

tasks:

  - name: Report 'changed' when the return code is not equal to 2
    ansible.builtin.shell: /usr/bin/billybass --mode="take me to the river"
    register: bass_result
    changed_when: "bass_result.rc != 2"

  - name: This will never report 'changed' status
    ansible.builtin.shell: wall 'beep'
    changed_when: False

You can also combine multiple conditions to override “changed” result.

- name: Combine multiple conditions to override 'changed' result
  ansible.builtin.command: /bin/fake_command
  register: result
  ignore_errors: True
  changed_when:
    - '"ERROR" in result.stderr'
    - result.rc == 2

Note

Just like when these two conditionals do not require templating delimiters ({{ }}) as they are implied.

See Defining failure for more conditional syntax examples.

Ensuring success for command and shell

The command and shell modules care about return codes, so if you have a command whose successful exit code is not zero, you can do this:

tasks:
  - name: Run this command and ignore the result
    ansible.builtin.shell: /usr/bin/somecommand || /bin/true

Aborting a play on all hosts

Sometimes you want a failure on a single host, or failures on a certain percentage of hosts, to abort the entire play on all hosts. You can stop play execution after the first failure happens with any_errors_fatal. For finer-grained control, you can use max_fail_percentage to abort the run after a given percentage of hosts has failed.

Aborting on the first error: any_errors_fatal

If you set any_errors_fatal and a task returns an error, Ansible finishes the fatal task on all hosts in the current batch, then stops executing the play on all hosts. Subsequent tasks and plays are not executed. You can recover from fatal errors by adding a rescue section to the block. You can set any_errors_fatal at the play or block level.

- hosts: somehosts
  any_errors_fatal: true
  roles:
    - myrole

- hosts: somehosts
  tasks:
    - block:
        - include_tasks: mytasks.yml
      any_errors_fatal: true

---
- hosts: load_balancers_dc_a
  any_errors_fatal: true

  tasks:
    - name: Shut down datacenter 'A'
      ansible.builtin.command: /usr/bin/disable-dc

- hosts: frontends_dc_a

  tasks:
    - name: Stop service
      ansible.builtin.command: /usr/bin/stop-software

    - name: Update software
      ansible.builtin.command: /usr/bin/upgrade-software

- hosts: load_balancers_dc_a

  tasks:
    - name: Start datacenter 'A'
      ansible.builtin.command: /usr/bin/enable-dc

In this example Ansible starts the software upgrade on the front ends only if all of the load balancers are successfully disabled.

Setting a maximum failure percentage

---
- hosts: webservers
  max_fail_percentage: 30
  serial: 10

The max_fail_percentage setting applies to each batch when you use it with serial. In the example above, if more than 3 of the 10 servers in the first (or any) batch of servers failed, the rest of the play would be aborted.

Note

The percentage set must be exceeded, not equaled. For example, if serial were set to 4 and you wanted the task to abort the play when 2 of the systems failed, set the max_fail_percentage at 49 rather than 50.

Controlling errors in blocks

You can also use blocks to define responses to task errors. This approach is similar to exception handling in many programming languages. See Handling errors with blocks for details and examples.

Источник

The main issue in your code is that $? is expanded before ssh is called. This is due to quoting. All expansions in a double-quoted string are expanded before the string is used. In addition to that, the double-quoted string that you are using with ssh contains other double-quoted sections. These sections would be unquoted, just like the substring abc is unquoted in "123"abc"456".

Instead of trying to execute a complicated command on the remote host, just let the ssh command cat the passwd file, then grep that:

if ssh -n "sandeep@$ipaddress" cat /etc/passwd | grep -q -F -e "$userid"
then
    echo "User exists"
else
    echo "User does not exist"
fi >>"/tmp/userfind_$DATE.txt"

Also, consider reading from the user and server list using a while loop instead:

while IFS= read -r userid; do
   # ...
done </home/sandeep/Project_finduser01/userslist

You may also redirect the outermost loop to your output file instead of redirecting every single echo:

while ...; do
    while ...; do
       # stuff
    done <userlist
done <serverlist  >"/tmp/userfind_$DATE.txt"

If your user list is long, you may want to only get the passwd from the remote host once, and then query that several times

while ...; do
    scp "sandeep@$ipaddress:/etc/passwd" passwd.tmp
    while ...; do
       if grep -q -F -e "$userid" passwd.tmp; then
          # exists
       fi
    done <userlist
done <serverlist  >"/tmp/userfind_$DATE.txt"

Even more efficiently would be to read the user list into an awk array and then match the usernames from the passwd file against them. That would get rid of the innermost loop entirely.

The username is found in a particular field in the passwd file. With your approach, you would match both marc and marco if you searched for marc. To match a bit more carefully, consider using a pattern such as "^$userid:" instead of matching against the whole line (and drop the -F that I introduced above if you’re still using grep to do this).

You may also avoid the parsing of the passwd file completely with

getent passwd "$userid" >/dev/null

This returns a zero exit code (success) if the user exists and non-zero otherwise.

I.e.,

if ssh -n "sandeep@$ipaddress" getent passwd "$userid" >/dev/null
then
    # exists
else
    # does not exist
fi

This would do one ssh call against the remote host per user though. This could be made a bit more efficient by not closing the connection between each call (the below would keep the connection open for one minute):

if ssh -n -o ControlMaster=auto -o ControlPersist=1m "sandeep@$ipaddress" getent passwd "$userid" >/dev/null
then
    # exists
else
    # does not exist
fi

Источник

Non-Zero return code: Ansible

How to resolve the problem?

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

Обработка ошибок в плейбуках

Игнорирование неудачных команд

игнорирование недоступных ошибок хоста

Сброс недоступных хостов

Дескрипторы и отказ

Defining failure

Defining “changed”

Обеспечение успеха для командования и снаряда

Прерывание игры на всех хозяевах

Прерывание первой ошибки:any_errors_fatal

Установка максимального процента отказа

Ошибки управления блоками

Aborting a play on all hosts

Ignoring failed commands

Ignoring unreachable host errors

Resetting unreachable hosts

Handlers and failure

Defining failure

Defining “changed”

Ensuring success for command and shell

Aborting a play on all hosts

Aborting on the first error: any_errors_fatal

Setting a maximum failure percentage

Controlling errors in blocks

Читайте также: