0

I am implementing a role in ansible where I need to:

  1. Start an application (retry max of 3 times). If the app was successfully started, it will return "OK", otherwise it will return "NOK".
  2. If the app was not successfully started (NOK), we should try to delete a xpto directory (max 3 times - after each NOK status). After this 3 times, (if the app was not successfully started) we should get fail status, and we should abort execution.
  3. If the app starts OK, there is no need to clean the directory, and we are ready to run the app.

We need to be aware of this:

  • Each time I try to start the app, if I get status "NOK" I must run a task to delete the xpto directory.
  • We can retry up to 3 times to start the app (and to delete the directory).
  • Each time we try to start the app with NOK status we must run the task to delete the directory.
  • If at any attempt of starting the app we get status OK (app started with success), we don't want to run task to delete the directory - In this case we should move to last task to run the app.
  • The role has only this 3 tasks (start app, delete directory, run the app)

For now I have only this with where I am missing a lot of the mentioned features:

---    
- name: Start app
  command: start app
  register: result
  tags: myrole

- name: Delete directory xpto if app didn't started ok
  command: rm -rf xpto
  when:
    - result.stdout is search("NOK")
  tags: myrole

- name: Run the application
  command: app run xpto
  when:
    - result.stdout is search("OK")
  tags: myrole

I have been pointed to an other question with a response which allows me to implement the 3 retries with abort option.

I am still missing the way to implement the option if the app starts ok (task1) and proceed directly to run the app (task3) (not going throw task2) and I don't know where to start.

Zeitounator
  • 38,476
  • 7
  • 53
  • 66
fr0zt
  • 733
  • 4
  • 12
  • 30
  • I don't see any roles here, I see tags though. So if all of your tasks are tagged with `role1` then they'll all try to run if you provide that tag. – Makoto Dec 14 '22 at 16:38
  • The role is role1, whicjh consists of task1,2, and 3 in the given example – fr0zt Dec 14 '22 at 16:43
  • Are you looking for [something like this](https://stackoverflow.com/questions/74728404/looping-or-repeating-a-group-of-tasks-until-success/74730893#74730893)? – Zeitounator Dec 14 '22 at 17:03
  • @Zeitounator that seems to do what I need. But with this code it fails after 7 retries which is ok. But lets say, if task1 is success i don't need it to retry 3 times. Instead it should go directly to task3. How can I do it using that code in the url? – fr0zt Dec 15 '22 at 14:42
  • To be further clear, if task1 gives me result "Action Terminated" I want it to stop the block (task_file.yml) and move to task3. How can I achieve that? – fr0zt Dec 15 '22 at 14:55
  • Since I think the problem description is over simplified, can you provide more details about "the scripts", what is the content of them and what they try to achieve? I ask because Ansible is more [Configuratuion Management](https://www.ansible.com/use-cases/configuration-management) than [Coding Language](https://www.redhat.com/sysadmin/ansible-coding-programming) and I like to understand cause and effect better. – U880D Dec 15 '22 at 16:55
  • I believe the scripts are not the issue in this case. I needed some help to implement the solution because although all answers helped in some points, I still don't have an answer which responds entirely to my question. As I am not experienced with ansible as I am struggling with this – fr0zt Dec 15 '22 at 17:06

3 Answers3

2

Building-up on the response you have been pointed to (which was inspired by a blog post on dev.to), adapting to your use case and adding some good practice.

start_run_app.yml would contain the needed tasks that can be retried:

---
- name: Group of tasks to start and run the application which could fail
  block:
    - name: Increment attempts counter
      ansible.builtin.set_fact:
        attempt_number: "{{ attempt_number | d(0) | int + 1 }}"
        
    - name: Start the application
      ansible.builtin.command: start app
      register: result_start
      failed_when: result_start.rc != 0 or result_start.stdout is search('NOK')

    # If we get here then application started ok above
    # register run result to disambiguate possible failure
    # in the rescue section
    - name: Run the application
      ansible.builtin.command: app run xpto
      register: result_run

  rescue:
    - name: Fail/stop here if application started but did not run
      ansible.builtin.fail:
        msg: "Application started but did not run. Exiting"
      when:
        - result_run is defined
        - result_run is failed

    - name: Delete directory xpto since app didn't start ok
      ansible.builtin.command: rm -rf xpto

    - name: "Fail if we reached the max of {{ max_attempts | d(3) }} attempts"
      # Default will be 3 attempts if max_attempts is not passed as a parameter
      ansible.builtin.fail:
        msg: Maximum number of attempts reached
      when: attempt_number | int == max_attempts | int | d(3)

    - name: Show number of attempts
      ansible.builtin.debug:
        msg: "group of tasks failed on attempt {{ attempt_number }}. Retrying"

    - name: Add delay if configured
      # no delay if retry_delay is not passed as parameter
      ansible.builtin.wait_for:
        timeout: "{{ retry_delay | int | d(omit) }}"
      when: retry_delay is defined

    - name: Include ourselves to retry.
      ansible.builtin.include_tasks: start_run_app.yml

And you can include this file like so (example for a full playbook, adapt to your exact need).

---
- name: Start and run my application
  hosts: my_hosts
  
  tasks:
    - name: Include retry-able set of tasks to start and run application
      ansible.builtin.include_tasks: start_run_app.yml
      vars:
        max_attempts: 6
        retry_delay: 5
Zeitounator
  • 38,476
  • 7
  • 53
  • 66
1
  • Considering that none of the tasks shown have any error handling, it seems that the scripts are all returning the exit code "0"; it would be better that in the logic where "action terminated" is printed to the output, it would also change the exit code to a custom value that then can be included in the Ansible logic:
 # changes in the shell script
 ...
 echo "action terminated"
 exit 130
 ...

that way the task 2 can be set with

- name: Task2
  command: "{{ home }}/script2.sh"
  when:
    - result.rc == 130
  tags: role1
  • After the execution of the task2, include an additional task that retries task1
- name: Task2.5
  command: "{{ home }}/script1.sh"
  register: result2
  until: "result2.rc != 130"
  ignore_errors: yes
  retries: 3
  delay: 5
  when:
    - result.rc == 130
  tags: role1

- name: Fail if script1 failed after all the attempts
  fail:
    msg: "script1 could not be completed"
  when:
     - result.rc == 130
     - result2.failed
  tags: role1

note that the when evaluates if the first attempt failed, as the register will keep track the status of the task in a different variable, this one is used in the evaluation of until. The Fail task will be executed only if both attempts were unsuccessful.

EDIT

If changing the exit code is not possible, you need to replace that condition to the search by text

- name: Task1
  command: "{{ home }}/script1.sh"
  register: result
  tags: role1

- name: Task2
  command: "{{ home }}/script2.sh"
  when:
    - result.stdout is search("action terminated")
  tags: role1

- name: Task2.5
  command: "{{ home }}/script1.sh"
  register: result2
  until: "'action terminated' not in result.stdout"
  ignore_errors: yes
  retries: 3
  delay: 5
  when:
    - result.stdout is search("action terminated")
  tags: role1

- name: Exit the execution if script1 failed after all the attempts
  fail:
    msg: "script1 could not be completed"
  when:
     - result.stdout is search("action terminated")
     - result2.failed
  tags: role1

- name: Task3
  command: "{{ home }}/script3.sh"
  tags: role1
Carlos Monroy Nieblas
  • 2,225
  • 2
  • 16
  • 27
  • I think this will not work because I need it to run task2 every time task1 sends result as "Action terminated". If task1 give another result (different from action terminated), than it should skip task2 and run task3 directly. I believe in your example it will retry to run task1 up to 3 times (but it will not run task2) – fr0zt Dec 14 '22 at 22:08
  • the proposal, if possible, is to use exit signals in the right way, instead of looking for the output "Action terminated"; if that is not possible, just change the condition to look up for that response code to the output of the command. – Carlos Monroy Nieblas Dec 14 '22 at 23:04
  • the when condition will ensure that the task2 will only be executed if the first execution failed – Carlos Monroy Nieblas Dec 14 '22 at 23:05
  • the retry up to 3 times will only happen if the first one returned that exit code; note that the execution of "Task2.5" will not interfere with the execution of "Task2", actually, both of them depend on the result of "Task1", that is why they have the same "when" condition – Carlos Monroy Nieblas Dec 14 '22 at 23:08
  • Ok, but if i get exit 130, it wil run task2. Then it will run task 2.5 3 times. I needed that each time it runs task 2,5, if it gets code 130 it should run task 2 as well. Do you understand? – fr0zt Dec 15 '22 at 00:16
  • Hello. From your example I think we are running task1 (action temrinated) -> task2 -> task 2.5 (action terminated) -> task 2.5 (action terminated) -> task 2.5 (action terminated) -> exit the execution -> task3. Is this correct? – fr0zt Dec 15 '22 at 09:22
  • WHat I needed is: task1 (action temrinated) -> task2 -> task 2.5 (action terminated) -> task2 -> task 2.5 (action terminated) -> task2 -> task 2.5 (action terminated) -> task2 -> exit the execution. (Eveytime task1/2.5 - with action terminated is run it should run task 2 as well). If action terminated is not present it should move to task3 – fr0zt Dec 15 '22 at 09:40
1

When a task failed on any host on ansible, it should not be trigger again by the next task. When a task failed on host, the host is removed from the current bash of ansible and the rest of tasks will not be run on that host. If you want to try a task for n, you will need to use the until and loop.

- name: Task1
  command: "{{ home }}/script1.sh"
  register: result
  tags: role1

- name: Task2
  command: "{{ home }}/script2.sh"
  when:
    - result.stdout is search("action terminated")
  tags: role1

- name: Task1 again
  command: "{{ home }}/script1.sh"
  register: result2
  tags: role1
  until: result2.stdout is search("action terminated")
  retries: 3

- name: Task2 Again
  command: "{{ home }}/script2.sh"
  when:
    - result2.stdout is not search("not action terminated")
    - result2.stdout is search("action terminated")
  tags: role1

- name: Task3
  command: "{{ home }}/script3.sh"
  tags: role1
  when:
    - result.stdout is search("not action terminated")

But you can only trigger a task by using the handler. But the playbook is going from top tasks to down and never go back on a previous tasks.

Zeitounator
  • 38,476
  • 7
  • 53
  • 66
idriss Eliguene
  • 779
  • 4
  • 11
  • Is this running task1 -> task2 -> task1 again (retries 3 timeS) -> task2 again -> task 3 ? I need it to run task2 after each time I run task1. For example: Task 1 -> task2 -> task1 -> task2 -> task1 -> task2 -> task3. – fr0zt Dec 15 '22 at 10:19
  • I will try, but i dont think it's possible – idriss Eliguene Dec 15 '22 at 20:48