1

My CI pipeline fails at the final destroy stage of running molecule test because the default timeout for closing a Docker container is not big enough.

Here is the error I get:

msg: 'Error removing container c6fff0374c2d8dc2b20ed991152ce8db5bbdf05a635c26648ce3c0a82c491eb2: UnixHTTPConnectionPool(host=''localhost'', port=None): Read timed out. (read timeout=60)'

It seems that my containers are too big and/or my CI runner machine not powerful enough for this to be done within the 60 seconds default timeout.

Here are advices I a found on this topic:

  • restart docker service: systemctl start docker
  • change the tiemout using environment variables:
export DOCKER_CLIENT_TIMEOUT=120
export COMPOSE_HTTP_TIMEOUT=120

Restarting docker doesn't solve my issue and is not convenient on my CI runner anyway.

I tried adding environment variables like such in molecule.yml:

provisioner:
  name: ansible
  env:
    MOLECULE_NO_LOG: "false"
    DOCKER_CLIENT_TIMEOUT: "240"
    COMPOSE_HTTP_TIMEOUT: "240"

But Docker doesn't seem to get them since I still get the same error message specifiying (read timeout=60).

To no avail I also tried to define them in the driver section of in molecule.yml:

driver:
  name: docker
  env:
    DOCKER_CLIENT_TIMEOUT: "240"
    COMPOSE_HTTP_TIMEOUT: "240"

The only way I get my job to end successfully is by running the tests against a single host at the time, which I guess reduces the ressources needed for my CI runner to close the containers within 60s. However it is not an apporiate solution since it requires to artificially complexify my jobs definition.

Isn't there a way to provide environment variables to the Docker driver ?

For the record this is my setup:

  • Python 3.6.8
  • ansible 2.10.3
  • molecule 3.2.0 using python 3.6
    • ansible:2.10.3
    • delegated:3.2.0 from molecule
    • docker:0.2.4 from molecule_docker
  • Docker version 19.03.14, build 5eb3275d40
  • GitLab Community Edition 13.7.1
  • gitlab-runner 13.6.0
xom9ikk
  • 2,159
  • 5
  • 20
  • 27
  • The problem is not closing the container in 60s. Have a look at `/usr/local/lib/python3.6/dist-packages/molecule_docker/playbooks/destroy.yml` (default destroy playbook) and you'll see that molecule can wait up to 1500s for removal. What happens is that the docker client is trying to connect to the deamon on your unix socket and does not receive an answer in the given 60s default timeout.. What you can try is duplicate the destroy playbook in your scenario and add your environment to the `docker_container` task and see if it changes something. – Zeitounator Jan 08 '21 at 12:15
  • Meanwhile my 2 cent: as you already suggest, this almost surely indicates that your ci runner cannot handle the load you put on it. There are only 2 good solutions I know to this problem: increase the runner resources or decrease the load (and eventually parallelize if possible) – Zeitounator Jan 08 '21 at 12:20
  • @Zeitounator thanks for the suggestion, using a custom the `destroy.yml` playbook does work to specify a bigger timeout to docker for "API response". For the record I also had to duplicate the `filter_plugins/get_docker_network.py` file too, in order to stay as close as possible to the default destroy.yml. I have yet to find out what should be the new timeout, 240s seems to still not be enough. – Gohier Francis Jan 12 '21 at 16:42
  • Since the `destroy` playbook is based on Ansible's `docker_container` module, the appropirate variable to set the timeout for Docker's API communication is actually `DOCKER_TIMEOUT` as documented [here](https://docs.ansible.com/ansible/latest/collections/community/general/docker_container_module.html#parameter-timeout). Defining this variable in `molecule.yml` `provisioner.env` section enables me to increase the default 60s timeout. This way I don't need to duplicate and customize the `destroy.yml` playbook. – Gohier Francis Jan 14 '21 at 08:21

0 Answers0