70

In my Ansible play I am restarting database then trying to do some operations on it. Restart command returns as soon as restart is started, not when db is up. Next command tries to connect to the database. That command my fail when db is not up.

I want to retry my second command a few times. If last retry fails, I want to fail my play.

When I do retries as follows

retries: 3
delay: 5

Then retries are not executed at all, because first command execution fails whole play. I could add ignore_errors: yes but that way play will pass even if all retries failed. Is there a easy way to retry failures until I have success, but fail when no success from last retry?

Flair
  • 2,609
  • 1
  • 29
  • 41
Bartosz Bilicki
  • 12,599
  • 13
  • 71
  • 113
  • 1
    Please post the whole task. I don't understand your concern -- first execution should not fail the play if you write it correctly. – techraf May 23 '17 at 12:32
  • The concern seems clear to me. I see the same behavior. When the first attempt fails, Ansible fails the whole playbook. It shouldn't be like that, but it is. Perhaps the `until` command is necessary? – falsePockets Mar 03 '19 at 23:41
  • 3
    Worth noting to others who find this is that `retries` is part of a `loop` and needs an `until` to work. Without `until` it will silently fail. https://github.com/ansible/ansible/issues/20802 – Elijah Lynn Apr 29 '19 at 16:48

4 Answers4

126

I don't understand your claim that the "first command execution fails whole play". It wouldn't make sense if Ansible behaved this way.

The following task:

- command: /usr/bin/false
  retries: 3
  delay: 3
  register: result
  until: result.rc == 0

produces:

TASK [command] ******************************************************************************************
FAILED - RETRYING: command (3 retries left).
FAILED - RETRYING: command (2 retries left).
FAILED - RETRYING: command (1 retries left).
fatal: [localhost]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["/usr/bin/false"], "delta": "0:00:00.003883", "end": "2017-05-23 21:39:51.669623", "failed": true, "rc": 1, "start": "2017-05-23 21:39:51.665740", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

which seems to be exactly what you want.

techraf
  • 64,883
  • 27
  • 193
  • 198
  • thats right. my task had wrong condition in 'until' section. – Bartosz Bilicki May 24 '17 at 17:59
  • 7
    as pointed in the answer below, small improvement would be `until: result is not failed` – Alex Jul 30 '19 at 09:12
  • 3
    FAILED! => {"msg": "The conditional check 'result.rc == 0' failed. The error was: error while evaluating conditional (result.rc == 0): 'dict object' has no attribute 'rc'" – user1325696 Sep 03 '20 at 16:04
  • It seems to me that 'result.rc == 0' way doesn't work any more, at least for me it is not working and I am using Ansible 2.11.2 – kwick Aug 17 '23 at 16:42
28

Not sure if this is Ansible tower specific, but I am using:

- command: /usr/bin/false
  register: result
  retries: 3
  delay: 10
  until: result is not failed
SerialEnabler
  • 872
  • 7
  • 10
19

Consider using wait_for module. It waits for a condition before continuing, for example for a port to become open or closed, for a file to exist or not, or for some content in a file.

Without seeing the rest of your playbook, consider the following example:

- name: Wait for db server to restart
  local_action:
    wait_for:
      host=192.168.50.4
      port=3306
      delay=1
      timeout=300

You can also adapt it as a handler and obviously change this snippet to suit your use-case.

slhck
  • 36,575
  • 28
  • 148
  • 201
Mxx
  • 8,979
  • 4
  • 27
  • 37
1

For the following task:

- hosts: all
become: yes
tasks:
- name: create the 'myusername' user
  user: name=myusername append=yes state=present createhome=yes shell=/bin/bash

I was not sure weather the remote was ready yet (because this was a newly spinned node). So I had to try those retries and delays stuff. Unfortunately with no luck. For now I ended up creating a wrapper in my bash script to achieve the needed behavior.

#!/bin/bash

STATUS_CODE=1
TRY=1
while [ "$STATUS_CODE" -ge 1 ]
do
  if [ $TRY -gt 5 ];
  then
    echo Retried to connect to node 5 times and failed. Exiting
    exit 1
  fi

  ansible-playbook -i $HOSTS_FILE user.yml
  STATUS_CODE=$?
  TRY=$(( $TRY + 1 ))

  if [ $STATUS_CODE -ge 1 ]
  then
    echo Retry to connect to node in 5 seconds
    sleep 5
  fi
done

Still in hopes to make it a cleaner way using ansible-playbook yml. Anyone got suggestions on this?

Oleksii Zymovets
  • 690
  • 8
  • 14