2

While running all the task at a time Ansible disconnected the SSH session in between running the task which took 26 hours to complete but ansible disconnected the SSH session after the 6 hours execution. Target server SSH configuration to keep the session as below:

ClientAliveInterval 172000
ClientAliveCountMax 10

Ansible task:

- name: Executing script
  remote_user: "{{admin_user}}"
  become: yes
  shell: sudo -u test bash ./customscript.sh  > /log_dir/customscript.log 2>&1
  args:
    chdir: "deployment_source/common"
  tags:
     - custom-test

Find the error log below:

22:11:44 TASK [role-deployment : Executing script] ************
22:11:44 fatal: [x.x.x.x]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to x.x.x.x closed.\r\n", "unreachable": true}
22:11:44 
22:11:44 NO MORE HOSTS LEFT *************************************************************
22:11:44  to retry, use: --limit @/opt/ansible/test/deployment.retry
22:11:44 
22:11:44 PLAY RECAP *********************************************************************
22:11:44 x.x.x.x : ok=6    changed=2    unreachable=1    failed=0

Kindly inform, what is the issue of disconnection? how can solve it?

Ifti
  • 123
  • 1
  • 6
  • 12

1 Answers1

11

You should never expect network connection to be stable that long.

There's async mechanism in Ansible to work with long-running jobs.

Refactor your code to be:

- name: Executing script
  remote_user: "{{admin_user}}"
  become: yes
  shell: sudo -u test bash ./customscript.sh  > /log_dir/customscript.log 2>&1
  args:
    chdir: "deployment_source/common"
  async: 180000
  poll: 60
  tags:
     - custom-test

To allow your task to be executed as much as 50 hours and check for completion every 60 seconds.

Konstantin Suvorov
  • 65,183
  • 9
  • 162
  • 193
  • 1
    I had an ansible task that was failing only on some servers and some clients with the error `UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to [host] closed.", "unreachable": true}` Using the `async` option stopped the fails immediately saving me from having to dig through countless ssh configs. It will be my goto for tasks that take more an a minute or two from now on. Thank you! – tfwright Jun 08 '20 at 20:06