0

I've been struggling with pulling a private image from my GitLab container registry when running a DockerOperator in Airflow 2.0.

My DockerOperator looks as follows:

python_mailer = DockerOperator(
   task_id='mailer',
   image='registry.gitlab.com/private422/mailer/image',
   docker_conn_id='gitlab-registry',
   api_version='auto',
   dag=dag
)

The gitlab-registry is defined in Airflow's connections with the username and password from a token that I created in GitLab:

GitLab token

However, when I try to run my DAG, I get the following error:

[2022-04-07 15:27:38,562] {base.py:74} INFO - Using connection to: id: gitlab-registry. Host: registry.gitlab.com, Port: None, Schema: , Login: gitlab+deploy-token-938603, Password: XXXXXXXX, extra: None
[2022-04-07 15:27:38,574] {taskinstance.py:1455} ERROR - Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.6/http/client.py", line 1291, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1337, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1286, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.6/http/client.py", line 1046, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.6/http/client.py", line 984, in send
    self.connect()
  File "/home/airflow/.local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

Does anyone have a clue what this could be about?

Note: I run Airflow locally.

Maarten
  • 207
  • 2
  • 13
  • Can you show the output of running `docker info` where airflow is running? (ideally run this as the same unix user airflow uses) My guess is this would be caused by docker not running or the user airflow uses not having permission to `/var/run/docker.sock`. – sytech Apr 07 '22 at 17:33
  • Hi, thanks for helping me out. The output of ```docker info``` is: Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.8.1-docker) scan: Docker Scan (Docker Inc., v0.17.0) Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info /var/run/docker.sock is present in the container as I have this in docker-compose: volumes: - /Users/Shared/run/docker.sock:/var/run/docker.sock – Maarten Apr 07 '22 at 17:41
  • I see, so you're running airflow inside a docker container? Are you running the airflow container with the `--privileged` flag? (if not, you'll need to do so) Also double check the source for that volume is correct. Based on the output you provided, it seems that you are unable to connect to the docker daemon successfully (or it is not running) – sytech Apr 07 '22 at 17:47
  • Yes, I'm running airflow inside a docker container. I also made sure that docker is installed in that container and as far as I know the source for that volume is correct. How do I run docker-compose as privileged? – Maarten Apr 07 '22 at 17:50
  • In your compose file, specify the key `privileged: true` for the airflow service(s). See [compose reference](https://docs.docker.com/compose/compose-file/compose-file-v3/#domainname-hostname-ipc-mac_address-privileged-read_only-shm_size-stdin_open-tty-user-working_dir) for usage. – sytech Apr 07 '22 at 17:54
  • Thanks for your help so far, I got a bit further. ```docker info``` now gives me the expected output so that seems to work. However, I now stumble upon the following error, any ideas? HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.41/auth – Maarten Apr 07 '22 at 18:27
  • Do the versions of the docker client inside and outside the container match? As a stab in the dark, you might be able to get around this by using `host` networking for the container (see the compose reference for how to do this)... but really what you might want to do is setup a remote docker daemon instead... If you can edit your question to provide your current docker-compose file, I can work on an answer for you to that effect. – sytech Apr 07 '22 at 20:16

1 Answers1

0

Note the error in the docker library:

  File "/home/airflow/.local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)

This means that the docker client is unable to connect to the docker daemon (by default, on unix socket /var/run/docker.sock)

The root cause of this issue is that you are running Airflow inside a docker container. In order for Airflow to invoke docker properly, it needs to communicate with a docker daemon, which won't be available/usable inside the container by default, even if you install docker in the container.

You'll notice that docker info fails inside the container:

docker exec -it airflow docker info
Client: Context: default 
...
Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

There's a couple approaches that you can do to solve this:

Use the host docker daemon

In order for the docker daemon on the host to be usable from inside another container you need (at least) two things

  1. Mount /var/run/docker.sock into the container (-v /var/run/docker.sock:/var/run/docker.sock)
  2. Run the container in privileged mode (--privileged)

After doing this, docker info should correctly report the server information as the host daemon.

Use a remote docker daemon

Use docker's remote APIs to have Airflow connect. For example, you can have docker running on a remote system available over the network and connect to that daemon remotely. You'll want to do this in a secure manner, like using SSH to connect to the daemon.

Setup a "remote" daemon locally in docker-compose

A way you can do this entirely locally would be by adding a docker:dind container to your compose section and then setting DOCKER_HOST in the airflow container to point to the dind container. The DOCKER_HOST environment variable tells docker to use a remote daemon instead of the default.

This is not necessarily the most secure setup, but it should be the simplest to implement.

version: "3.8"

services:
  docker:
    image: docker:dind
    privileged: true
    environment:
      DOCKER_TLS_CERTDIR: ""
  airflow:
    # ... docker client should be installed in this image
    environment:
      DOCKER_HOST: "tcp://docker:2375"
    depends_on: [docker]

In your DockerOperator invocation, also need to provide the docker_url argument and set mount_tmp_dir to False:

python_mailer = DockerOperator(
   docker_url="tcp://docker:2375",
   mount_tmp_dir=False,
   # ... other options
)
sytech
  • 29,298
  • 3
  • 45
  • 86