5

I'm wondering if there is any decent way to require all hosts that a set of tasks is supposed to execute for actually being reachable?

I'm currently trying to get it to handle an update that could be pain if they are not all relevant nodes are updated in synch.

Pierre Andersson
  • 320
  • 2
  • 12

4 Answers4

7

I was about to post a question, when I saw this one. The answer Duncan suggested does not work, atleast in my case. the host is unreachable. All my playbooks specify a max_fail_percentage of 0.

But ansible will happily execute all the tasks on the hosts that it is able to reach and perform the action. What I really wanted was if any of the host is unreachable, don't do any of the tasks.

What I found was a simple but might be considered hacky solution, and an open for better answers.

Since the first step as part of running the playbooks, ansible gathers facts for all the hosts. And in case where a host is not reachable it will not be able to. I write a simple play at the very beginning of my playbook which will use a fact. And in case a host is unreachable that task will fail with "Undefined variable error". The task is just a dummy and will always pass if all hosts are reachable.

See below my example:

- name: Check Ansible connectivity to all hosts
  hosts: host_all
  user: "{{ remote_user }}"
  sudo: "{{ sudo_required }}"
  sudo_user: root
  connection: ssh # or paramiko
  max_fail_percentage: 0
  tasks:
    - name: check connectivity to hosts (Dummy task)
      shell: echo " {{ hostvars[item]['ansible_hostname'] }}"
      with_items: groups['host_all']
      register: cmd_output

    - name: debug ...
      debug: var=cmd_output

In case a host is unreachable you will get an error as below:

TASK: [c.. ***************************************************** 
fatal: [172.22.191.160] => One or more undefined variables: 'dict object'    has no attribute 'ansible_hostname' 
fatal: [172.22.191.162] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'

FATAL: all hosts have already failed -- aborting

Note: If your host group is not called host_all, you must change the dummy task to reflect that name.

honk
  • 9,137
  • 11
  • 75
  • 83
Zoro_77
  • 397
  • 1
  • 4
  • 16
  • Thanks, I ended up using this as a pretask [See Gist](https://gist.github.com/JakeDEvans/00aaaab051a9c234de7f32da1bb2b8c2) – Jacob Evans Jun 08 '16 at 16:37
3

You can combine any_errors_fatal: true or max_fail_percentage: 0 with gather_facts: false, and then run a task that will fail if the host is offline. Something like this at the top of the playbook should do what you need:

- hosts: all
  gather_facts: false
  max_fail_percentage: 0
  tasks:
    - action: ping

A bonus is that this also works with the -l SUBSET option for limiting matching hosts.

wilkystyle
  • 327
  • 1
  • 11
  • 1
    why is the gather-facts necessary? – hbogert May 27 '16 at 12:30
  • 1
    By default, Ansible will only operate on hosts that are reachable, and it determines that when gathering facts. A subsequent `ping` will always succeed, because Ansible is only attempting to run the playbook on hosts that it knows are already up. – wilkystyle May 28 '16 at 17:55
  • 2
    The running behaviour of ansible has changed in such a way in 2.0, that this does not work anymore. – hbogert Jun 11 '16 at 13:30
  • Actually, only sure about 2.1, not 2.0 (couldn't edit previous comment anymore) – hbogert Jun 11 '16 at 13:37
1

You can add max_fail_percentage into your playbook - something like this:

- hosts: all_boxes
  max_fail_percentage: 0
  roles:
    - common
  pre_tasks:
    - include: roles/common/tasks/start-time.yml
    - include: roles/common/tasks/debug.yml

This way you can decide how much failure you want to tolerate. Here is the relevant section from the Ansible Documentation:

By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed. In some situations, such as with the rolling updates described above, it may be desirable to abort the play when a certain threshold of failures have been reached. To achieve this, as of version 1.3 you can set a maximum failure percentage on a play as follows:

  • hosts: webservers max_fail_percentage: 30 serial: 10 In the above example, if more than 3 of the 10 servers in the group were to fail, the rest of the play would be aborted.

Note: The percentage set must be exceeded, not equaled. For example, if serial were set to 4 and you wanted the task to abort when 2 of the systems failed, the percentage should be set at 49 rather than 50.

Duncan Lock
  • 12,351
  • 5
  • 40
  • 47
1

Inspired from other questions/answers. https://stackoverflow.com/a/55219490/457589

Using ansible-playbook 2.7.8.

Checking if there are any ansible_facts for each required hosts feels more explicit to me.

# my-playbook.yml
- hosts: myservers
  tasks:
    - name: Check ALL hosts are reacheable before doing the release
      fail:
        msg: >
          [REQUIRED] ALL hosts to be reachable, so flagging {{ inventory_hostname }} as failed,
          because host {{ item }} has no facts, meaning it is UNREACHABLE.
      when: "hostvars[item].ansible_facts|list|length == 0"
      with_items: "{{ groups.myservers }}"

    - debug:
        msg: "Will only run if all hosts are reacheable"
$ ansible-playbook -i my-inventory.yml my-playbook.yml

PLAY [myservers] *************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************************************
fatal: [my-host-03]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-03: Name or service not known", "unreachable": true}
fatal: [my-host-04]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname my-host-04: Name or service not known", "unreachable": true}
ok: [my-host-02]
ok: [my-host-01]

TASK [Check ALL hosts are reacheable before doing the release] ********************************************************************************************************************************************************************************************************************
failed: [my-host-01] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-01] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-01 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-03) => {"changed": false, "item": "my-host-03", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-03 has no facts, meaning it is UNREACHABLE."}
failed: [my-host-02] (item=my-host-04) => {"changed": false, "item": "my-host-04", "msg": "[REQUIRED] ALL hosts to be reachable, so flagging my-host-02 as failed, because host my-host-04 has no facts, meaning it is UNREACHABLE."}
skipping: [my-host-01] => (item=my-host-01)
skipping: [my-host-01] => (item=my-host-02)
skipping: [my-host-02] => (item=my-host-01)
skipping: [my-host-02] => (item=my-host-02)
        to retry, use: --limit @./my-playbook.retry

PLAY RECAP *********************************************************************************************************************************************************************************************************************
my-host-01 : ok=1    changed=0    unreachable=0    failed=1
my-host-02 : ok=1    changed=0    unreachable=0    failed=1
my-host-03 : ok=0    changed=0    unreachable=1    failed=0
my-host-04 : ok=0    changed=0    unreachable=1    failed=0
Julien
  • 1,765
  • 20
  • 26