0

I'm new to Airflow. I was able to follow a video and create the docker-compose yml file, Dockerfile, and a dag file. I am able to view my dag and run it. In my script, I'm trying to open a text file (.txt), but I get the following error: FileNotFoundError: \[Errno 2\] No such file or directory.

I have the text file in the correct location. The script runs on my local python environment. I don't know why it's showing as an error when I run in it in Airflow.

My docker-compose.yml, Dockerfile, and dag files will be shown below. I'd appreciate any sort of help! Thank you!

docker-compose.yml

version: '3.7'

services:
  postgres:
    image: postgres:9.6
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    logging:
      options:
        max-size: 10m
        max-file: "3"

  webserver:
    build: ./dockerfiles
    restart: always
    depends_on:
      - postgres
    environment:
      - LOAD_EX=n
      - EXECUTOR=Local
    logging:
      options:
        max-size: 10m
        max-file: "3"
    volumes:
      - ./dags:/usr/local/airflow/dags
      # - ./plugins:/usr/local/airflow/plugins
    ports:
      - "8080:8080"
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid]"]
      interval: 30s
      timeout: 30s
      retries: 3

Dockerfile

FROM puckel/docker-airflow:1.10.9
RUN pip install requests
RUN pip install bs4
RUN pip install pandas
RUN pip install xlrd
RUN pip install openpyxl

dag file

try:

    from datetime import timedelta
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from datetime import datetime
    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    import smtplib
    from email.message import EmailMessage
    import os
    import sys
    import xlrd
    from datetime import datetime
    from openpyxl import load_workbook
    print("All Dag modules are ok.........")

except Exception as e:
    print("Error {} ".format(e))

def craigslist_search_function():
    ***PYTHON CODE***

with DAG(
        dag_id="craigslist_dag",
        schedule_interval="*/30 * * * *",
        default_args={
            "owner": "airflow",
            "retries": 1,
            "retry_delay": timedelta(minutes=5),
            "start_date": datetime(2022, 1, 1),
        },
        catchup=False) as f:

    craigslist_search_function = PythonOperator(
        task_id="craigslist_search_function",
        python_callable=craigslist_search_function)

I was expecting it to run the script with no issues. The script works perfectly fine in my local python environment. I don't know why it does not work in Airflow.

bguiz
  • 27,371
  • 47
  • 154
  • 243
Zaman
  • 1
  • 1

1 Answers1

1

Containers can't access files which are not mounted on them. Airflow can see your DAG because you mounted it under key volumes. Try adding the directory which contains your text file to volumes in airflow server:

volumes:
  - ./dags:/usr/local/airflow/dags
  - local_directory_path:container_directory_path

And when you are reading this file from DAG task make sure you are reading it from container_directory_path, not your local path.

general_418
  • 43
  • 1
  • 5
  • Hey. Thanks for reaching out and trying to help. So for the local directory path and container path I used: - '/Users/mohammedasaduzzaman/Documents/John/:/var/lib/docker/volumes' with no luck. I still get the same error as above. The "John" folder has the "data.txt" file. I'm using a MacOS so is that the correct volumes folder? If not, do you know where exactly I can find the volumes folder? I've tried adding the volume as './volumes:/Users/mohammedasaduzzaman/Documents/John/' with the same result. Please let me know if possible. Thank you again so much for your help! – Zaman Apr 09 '22 at 08:51