4

I have a Google Cloud Composer 1 environment (Airflow 2.1.2) where I want to run an Airflow DAG that utilizes the KubernetesPodOperator.

Cloud Composer makes available to all DAGs a shared file directory for storing application data. The files in the directory reside in a Google Cloud Storage bucket managed by Composer. Composer uses FUSE to map the directory to the path /home/airflow/gcs/data on all of its Airflow worker pods.

In my DAG I run several Kubernetes pods like so:

    from airflow.contrib.operators import kubernetes_pod_operator
    
    # ...

    splitter = kubernetes_pod_operator.KubernetesPodOperator(
        task_id='splitter',
        name='splitter',
        namespace='default',
        image='europe-west1-docker.pkg.dev/redacted/splitter:2.3',
        cmds=["dotnet", "splitter.dll"],
    )

The application code in all the pods that I run needs to read from and write to the /home/airflow/gcs/data directory. But when I run the DAG my application code is unable to access the directory. Likely this is because Composer has mapped the directory into the worker pods but does not extend this courtesy to my pods.

What do I need to do to give my pods r/w access to the /home/airflow/gcs/data directory?

Community
  • 1
  • 1
urig
  • 16,016
  • 26
  • 115
  • 184

1 Answers1

1

Cloud Composer uses FUSE to mount certain directories from Cloud Storage into Airflow worker pods running in Kubernetes. It mounts these with default permissions that cannot be overwritten, because that metadata is not tracked by Google Cloud Storage. A possible solution is to use a bash operator that runs at the beginning of your DAG to copy files to a new directory. Another possible solution can be to use a non-Google Cloud Storage path like a /pod path.

Jose Gutierrez Paliza
  • 1,373
  • 1
  • 5
  • 12
  • Thank you. I was hoping for something akin to Docker's "bind mounts". Is there no equivalent concept in k8s? – urig Nov 05 '21 at 09:37
  • 1
    Maybe Anthos[1] this unifies the management between applications and infrastructure. [1]https://cloud.google.com/anthos – Jose Gutierrez Paliza Nov 05 '21 at 23:08
  • Thanks again. The volume mounted by Cloud Composer is called `gcsdir` and is of type `emptyDir`. Might it not be possible to change its type to `persistent disk` and then mount it from the pods I launch with `KubernetesPodOperator`? – urig Nov 11 '21 at 14:44
  • Why would you want to change gcsdir from emptyDir to a persistent disk? Anyways, you can create a GCE instance w/ volume and volumeClaim and then copy data into a pod and then connect to the volume using the [KubernetesPodOperator](https://github.com/apache/airflow/blob/2bafc089ce549d7afb3000129cf5a8e3b0f36fac/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L56). – Jose Gutierrez Paliza Nov 23 '21 at 23:59