1

When I run a playbook that simply copies a directory from one place to another, ansible throws

ERROR! A worker was found in a dead state

Error. After some googling, it looks like this is caused by the oom-killer killing the ansible process (but i'm not exactly sure that this is the case). My Memory is:

              total        used        free      shared  buff/cache   available
Mem:            991         372         448           1         170         467
Swap:           511         365         146

I don't have a clue how to fix it. I should mention I only had the RAM when I first executed the playbook, which couldn't run cause of low memory. After that, I added the swap.Not sure if it's related but note that it's a swap file, not a separate partition.

I've watched the memory while running and I see that the free swap goes down very fast once it runs that task. The error message is thrown when it hits 0.


I'm running the following playbook.

---
- hosts: localhost
  become: true
  become_method: sudo
  become_user: root

  vars:
    portals:
      - mysite
    contentPath: "/var/www/"
    backupPath: "/home/dataFiles/backups/"

  tasks:

    - name: backup content
      copy:
        src: "{{ contentPath }}/{{ item }}"
        dest: "{{backupPath}}/{{ item }}/{{ ansible_date_time.date }}/"
      with_items:
        - "{{ portals }}"
...

The error I've given above is the only info I get out of ansible. Even running the playbook verbosely doesn't give anything additional for that.

Sudh33ra
  • 179
  • 1
  • 2
  • 7
  • Update: added 2 GB of swap. It still runs out when the command is run. – Sudh33ra Mar 16 '17 at 13:53
  • You should add the task which throws the error and the complete error message returned by Ansible. – Henrik Pingel Mar 16 '17 at 14:10
  • @HenrikPingel Thanks for the response. Added the info. – Sudh33ra Mar 17 '17 at 06:37
  • Look on your logs. Read up on [ansible debugging](https://www.google.co.uk/search?q=ansible+debugging). Pfovide more information. – user9517 Mar 17 '17 at 06:56
  • Yes, you need to run your `ansible-playbook` command with `-vvv`. You should also add your ansible version with `ansible --verison` – Henrik Pingel Mar 17 '17 at 07:05
  • If there are many files under `portals` you might want to check if the [synchronize](http://docs.ansible.com/ansible/synchronize_module.html) module works better for you. Its a wrapper around rsync. – Henrik Pingel Mar 17 '17 at 07:09

4 Answers4

1

There is a note in the copy_module documentation:

The “copy” module recursively copy facility does not scale to lots (>hundreds) of files. For alternative, see synchronize module, which is a wrapper around rsync.

Assuming this is the case here one should consider using the synchronize module.

Tms91
  • 103
  • 4
Henrik Pingel
  • 9,380
  • 2
  • 28
  • 39
0

Something like this works for me:

env no_proxy='*' ansible-playbook collect-facts.yml

You can also add export no_proxy="*" to your .bashrc or .zshrc, so you don't need to type it every time.

More detail on this : https://www.whatan00b.com/posts/debugging-a-segfault-from-ansible/

(Credit: https://github.com/ansible/ansible/issues/32554#issuecomment-572985680)

FrankyFred
  • 101
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 09 '22 at 14:14
  • The relationship with the question is unclear. A segfault is not an oom-killer event. – reinierpost May 01 '23 at 13:56
0

I had the same error, but I was not using the copy module in any task.

In my case the problem was that the machine (on which was running the ansible task when the error appeared) was run out of memory.

I checked it out by launching the playbook and then in another terminal launching htop and monitoring how the RAM and the SWAP were filling up as the execution of the playbook was going on.

I solved it by increasing the RAM of the machine which was failing.

(Increasing the SWAP would not be so effective since it is not a random access memory but disk memory, so the "computational power" given by 1GB of SWAP is way less thant the one given by 1GB of RAM.)

Tms91
  • 103
  • 4
0

This solved my issue:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Aminovic
  • 103
  • 5