I am developing an ETL process to be scheduled and orchestrated with Apache Airflow using the DockerOperator. I am working on a Windows Laptop, so I can only run Apache Airflow from inside a docker container. I was able to mount a folder on my windows laptop with config files (called configs
below) into the airflow container (named webserver below) using a volume specified in the below docker-compose.yml
file residing in my project root directory. The relevant code from the docker-compose.yml
file can be seen below:
version: '2.1'
webserver:
build: ./docker-airflow
restart: always
privileged: true
depends_on:
- mongo
- mongo-express
environment:
- LOAD_EX=n
- EXECUTOR=Local
volumes:
- ./docker-airflow/dags:/usr/local/airflow/dags
# Volume for source code
- ./src:/src
- ./docker-airflow/workdir:/home/workdir
# configs folder as volume
- ./configs:/configs
# Mount the docker socket from the host (currently my laptop) into the webserver container so that the webserver container can create "sibbling" containers
- //var/run/docker.sock:/var/run/docker.sock # the two "//" are needed for windows OS
ports:
- 8081:8080
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- mynet
Now I want to pass this configs
folder with all its content on to the containers which are created by the DockerOperator. Although this configs
folder was apparently mounted into the webserver container's file system, this configs
folder is completely empty and because of that, my DAG fails. The code for the DockerOperator is as follows:
cmd = "--config_filepath {} --data_object_name {}".format("/configs/dev.ini", some_data_object)
staging_op = DockerOperator(
command=cmd,
task_id="my_task",
image="{}/{}:{}".format(docker_hub_username, docker_hub_repo_name, image_name),
api_version="auto",
auto_remove=False,
network_mode=docker_network,
force_pull=True,
volumes=["/configs:/configs"] # "absolute_path_host:absolute_path_container"
)
According to the documentation, the left side of the volume must be an absolute path on the host, which (if I understood correctly) is the webserver container in this case (because it creates separate containers for every task). The right side of the volume is a directory inside the task's container which is created by the DockerOperator. As mentioned above, the configs
folder inside the task's container does exist, but is completely empty. Does anyone know why this is the case and how to fix it?
Thank you very much for your help!