0

I am attempting to use the Gitsync option to Load Dags with the Community Airflow Helm Chart. It appears to be syncing in the init container (dags-git-clone)

I0824 15:21:52.114912      14 main.go:473] "level"=0 "msg"="starting up" "pid"=14 "args"=["/git-sync"]
I0824 15:21:52.115089      14 main.go:923] "level"=0 "msg"="cloning repo" "origin"="git@gitlab.techopscloud.com:gulledc/airflow-helm.git" "path"="/dags"
I0824 15:21:57.442933      14 main.go:737] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="5ff4ef7ff5ff337dc45c01b9a53638505bc683d1"
I0824 15:21:57.775800      14 main.go:772] "level"=0 "msg"="adding worktree" "path"="/dags/5ff4ef7ff5ff337dc45c01b9a53638505bc683d1" "branch"="origin/master"
I0824 15:21:57.780648      14 main.go:833] "level"=0 "msg"="reset worktree to hash" "path"="/dags/5ff4ef7ff5ff337dc45c01b9a53638505bc683d1" "hash"="5ff4ef7ff5ff337dc45c01b9a53638505bc683d1"
I0824 15:21:57.780680      14 main.go:838] "level"=0 "msg"="updating submodules"

All the pods are running, but when I go to check the webserver, the dags list is empty. I know it may take time to sync but I have let the pods sit for 30 mins or so without result..

I have updated my id_rsa as a k8 secret, and the sync container appears to be working, but the logs for the "dag-git-sync" container are as such:

I0824 15:23:21.477543      12 main.go:473] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync"]

How is it stuck starting up after cloning the repo? I only have one test dag in the gitlab folder...

Any help would be great. Thanks

For reference, my config looks like this in the community helm chart:

dags:
    ## the airflow dags folder
    path: /opt/HELM_AIRFLOW/airflow-helm/dags #CHANGE
    #not picking up path.. not sure
  
    ## configs for the dags PVC
    ## [FAQ] https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/dags/load-dag-definitions.md
    persistence:
      enabled: false
      #storageClassName: aws-efs


        
    ## configs for the git-sync sidecar
    ## [FAQ] https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/dags/load-dag-definitions.md
    gitSync:
      enabled: true
    

      repo: "git@gitlab.techopscloud.com:gulledc/airflow-helm.git" #CHANGE, TRYING SSH
      branch: "master"
      revision: "HEAD"
      dest: "git"
      depth: 0
      maxFailures: 5
      #repoSubPath: dags
      sshSecret: "ssh-key-secret"
      shhSecretKey: "id_rsa"
      syncWait: 60 #CHANGE
      resources: #CHANGE
        requests:
            cpu: ".5"
            memory: 100Mi
ColeGulledge
  • 393
  • 1
  • 2
  • 12

1 Answers1

0

Turns out I needed to specify the DAG() in the operator. Thus, the dag path was pointing to null, and not to a DAG.

The pods read the DAGs not by their file name (what I thought), but by the task name.

dag = DAG('hello_world', description='Hello World DAG',
          schedule_interval='0 12 * * *',
          start_date=datetime(2022, 8, 24), catchup=False)

hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)

hello_operator

In this case, hello_world would be the name of the dag, not the file name.

ColeGulledge
  • 393
  • 1
  • 2
  • 12