1

I trigger multiple Tomcat startup scripts and then need to check if all process listens on their specific port across multiple hosts in the quickest time possible.

For the test case, I m writing 3 scripts that run on a single host and listen on ports 4443, 4445, 4447 respectively as below.

/tmp/startapp1.sh

while test 1 # infinite loop
sleep 10
do
    nc -l localhost 4443 > /tmp/app1.log
done

/tmp/startapp2.sh

while test 1 # infinite loop
sleep 30
do
    nc -l localhost 4445 > /tmp/app2.log
done

/tmp/startapp3.sh

while test 1 # infinite loop
sleep 20
do
nc -l localhost 4447 > /tmp/app3.log
done

Below is my code to trigger the script and check if the telnet is successful:

main.yml

- include_tasks: "internal.yml"
  loop:
    - /tmp/startapp1.sh 4443
    - /tmp/startapp2.sh 4445
    - /tmp/startapp3.sh 4447

internal.yml

- shell: "{{ item.split()[0] }}"
  async: 600
  poll: 0

- name: DEBUG CHECK TELNET
  shell: "telnet {{ item.split()[1] }}"
  delegate_to: localhost
  register: telnetcheck
  until: telnetcheck.rc == 0
  async: 600
  poll: 0
  delay: 6
  retries: 10

- name: Result of TELNET
  async_status:
    jid: "{{ item.ansible_job_id }}"
  register: _jobs
  until: _jobs.finished
  delay: 6
  retries: 10
  with_items: "{{ telnetcheck.results }}"

To run: ansible-playbook main.yml

Requirement: the above three scripts should start along with telnet check in about 30 seconds.

Thus, the basic check that needs to be done here is telnet until: telnetcheck.rc == 0 but due to async the telnet shell module does not have entries for rc and hence I get the below error:

"msg": "The conditional check 'telnetcheck.rc == 0' failed. The error was: error while evaluating conditional (telnetcheck.rc == 0): 'dict object' has no attribute 'rc'"

In the above code where and how can I check if telnet had succeeded i.e telnetcheck.rc == 0 and make sure the requirement is met?

β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83
Ashar
  • 2,942
  • 10
  • 58
  • 122

2 Answers2

1

Currently I am not aware a solution with which one could start a shell script and wait for a status of it in one task. It might be possible to just change the shell script according the necessary behavior and let it provide self checks and exit codes. Or you could implement two or more tasks, whereby one is executing the shell script and the others later check on certain conditions.

Regarding your requirement

wait until telnet localhost 8076 is LISTENING (successful).

you may have a look into the module wait_for.

---
- hosts: localhost
  become: false
  gather_facts: false

  tasks:

  - name: "Test connection to local port"
    wait_for:
      host: localhost
      port: 8076
      delay: 0
      timeout: 3
      active_connection_states: SYN_RECV
    check_mode: false # because remote module (wait_for) does not support it
    register: result

  - name: Show result
    debug:
      msg: "{{ result }}"

Further Q&A


An other approach of testing from Control Node on Remote Node if there is a LISTENER on localhost could be

---
- hosts: test.example.com
  become: true
  gather_facts: false

  vars:

    PORT: "8076"

  tasks:

  - name: "Check for LISTENER on remote localhost"
    shell:
      cmd: "lsof -Pi TCP:{{ PORT }}"
    changed_when: false
    check_mode: false
    register: result
    failed_when: result.rc != 0 and result.rc != 1

  - name: Report missing LISTENER
    debug:
      msg: "No LISTENER on PORT {{ PORT }}"
    when: result.rc == 1
U880D
  • 8,601
  • 6
  • 24
  • 40
  • if each process takes 10 minutes i.e 6 start script X 10 = 60 minutes to start will your solution work async i.e take 10 minutes to check for all the telnets parallelly or does it work serial i.e check for the next telnet only after the results of the first telnet is received or will it take 60 minutes? – Ashar Jun 20 '22 at 10:22
  • @Ashar, Regarding your comment "_if each process takes 10 minutes, i.e 6 start script x 10 min = 60 minutes to start _" and your provided description "_I run a script across multiple hosts_", I do not understand this question at all. Are you running your script across multiple hosts in parallel? Or you running your script in serial, one host after each other? – U880D Jun 20 '22 at 10:29
  • multiple scripts on one host as well as across multiple hosts. I'm able to start all of them `async` using the code I shared but the telnet check should also be `async` as the telnet check should not delay the startup process. – Ashar Jun 20 '22 at 10:32
  • @Ashar, regarding "_multiple scripts on one host as well as across multiple hosts_" and "_the telnet check should also be async as the telnet check should not delay the startup process_", I recommend to provide much more information, description and code about your application, startup process and services used. – U880D Jun 20 '22 at 11:39
  • its a tomcat start script .... for test case you can simply write `sleep 50` in a shell script and execute. I m updating the Original post with a test case – Ashar Jun 20 '22 at 12:22
  • @Ashar, "_for test case you can simply write sleep 50 in a shell script and execute_", I am still not sure what trying to achieve. Ansible is a simple Configuration Management tool were one can declare a final state, idempotent. In all my enterprise cases I just need to check if there is a LISTENER up and running, no need for other complex things. – U880D Jun 20 '22 at 12:45
  • I have updated the original post now with the exact testcase & requirement – Ashar Jun 20 '22 at 12:46
  • your solution does not work – Ashar Jun 20 '22 at 15:35
1

Using an asynchronous action and an until in the same task makes nearly no sense.

As for your requirement to have the answer in the quickest time possible, you will have to rethink it through. With your three ports case, if you want them all to be opened before you move on the task, it will always be as slow as the slowest port to open, no matter what. Even if the first we probe is indeed the slowest, the two other will then probe in no time, so, trying to optimise it in an async is, to my point of view, an unnecessary optimisation.

Either you want to use until, and then each port probe would be stuck until they answer, or you want to run them asynchronously and the async_status will catch the return as it should if you wrap the telnet in a shell until loop.

In your until loop, the issue is that the return code won't be set until the command does indeed return, so you just have to check if the rc key of the dictionary is defined.

Mind that for all the examples below, I am manually opening port with nc -l -p <port>, this is why they do gradually open.


With until:

- shell: "telnet localhost {{ item.split()[1] }}"
  delegate_to: localhost
  register: telnetcheck
  until:
    - telnetcheck.rc is defined
    - telnetcheck.rc == 0
  delay: 6
  retries: 10

This will yield:

TASK [shell] *****************************************************************
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
FAILED - RETRYING: [localhost]: shell (10 retries left).
changed: [localhost] => (item=/tmp/startapp3.sh 4447)

With async:

- shell: "until telnet 127.0.0.1 {{ item.split()[1] }}; do sleep 2; done"
  delegate_to: localhost
  register: telnetcheck
  async: 600
  poll: 0

- async_status:
    jid: "{{ item.ansible_job_id }}"
  register: _jobs
  until: _jobs.finished
  delay: 6
  retries: 10
  loop: "{{ telnetcheck.results }}"
  loop_control:
    label: "{{ item.item }}"

This will yield:

TASK [shell] *****************************************************************
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
changed: [localhost] => (item=/tmp/startapp3.sh 4447)

TASK [async_status] **********************************************************
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp1.sh 4443)
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp2.sh 4445)
FAILED - RETRYING: [localhost]: async_status (10 retries left).
changed: [localhost] => (item=/tmp/startapp3.sh 4447)

This said, you have to seriously consider @U880D's answer, as this is a more native answer for Ansible:

- wait_for:
    host: localhost
    port: "{{ item.split()[1] }}"
    delay: 6
    timeout: 60

This will yield:

TASK [wait_for] **************************************************************
ok: [localhost] => (item=/tmp/startapp1.sh 4443)
ok: [localhost] => (item=/tmp/startapp2.sh 4445)
ok: [localhost] => (item=/tmp/startapp3.sh 4447)
β.εηοιτ.βε
  • 33,893
  • 13
  • 69
  • 83