3

I'm trying use airflow on Docker. my_python.py file in dags directory like :

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import argparse
import psycopg2
import csv
import os
import sys
from datetime import datetime
from google.cloud import bigquery
from google.oauth2 import service_account

def postgresql_database_connection(table_name, data_file):
...

def write_to_bigquery(dataset_name, table_name, data_file):
...

dag = DAG('my_python',
          default_args=default_args,
          schedule_interval='00-59/30 * * * *',
          catchup=False,
          max_active_runs=1)

task1 = PythonOperator(
    task_id='table_database_connection',
    python_callable=postgresql_database_connection,
    op_args=[TABLE_NAME,DATA_FILE],
    dag=dag)

task2 = PythonOperator(
    task_id='table_write_to_bigquery',
    python_callable=write_to_bigquery,
    op_args=[DATASET_NAME,TABLE_NAME,DATA_FILE,args.env],
    dag=dag)

task1 >> task2

My .dockerfile like :

FROM python:3.7

ARG AIRFLOW_USER_HOME=/usr/local/airflow

ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}

RUN ...
    && pip install apache-airflow==2.0.0 \
    && pip install psycopg2-binary \
    && pip install google-cloud \
    && pip install google-oauth

EXPOSE 8080 8793

USER airflow
WORKDIR ${AIRFLOW_USER_HOME}

I got an error on airflow webserver : enter image description here

Folders path:

-dags: --my_python.py

-airflow-test.Dockerfile
-docker-compose.yaml

Where is my fault? Is it python or airflow versions problem, or my dockerfile problem?

EEks
  • 51
  • 5
  • I see that according to the message in your webUI, the DAG causing the issue is `product_content.py`. Could you share it within your question? Also, why are you using `from google.cloud import bigquery` for? – Alexandre Moraes Feb 12 '21 at 12:40
  • Yes my_python.py file is product_content.py actually. Im using bigquery package for load csv file to bigquery. – EEks Feb 12 '21 at 15:44
  • try `pip install apache-airflow-providers-google[amazon]` refer [airflow<>google](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/index.html) – neilharia7 Feb 13 '21 at 07:08
  • @EEks, you need to install the BigQuery Client Library to your environment with `pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'`, as per [documentation](https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#install_the_client_libraries). Did it work for you? – Alexandre Moraes Feb 22 '21 at 07:45
  • pip install apache-airflow-providers-google it works for me. – EEks Feb 23 '21 at 09:05

1 Answers1

1

The error is related to missing packages in your environment.

As @neilharia7 mentioned, you can use pip install apache-airflow-providers-google[amazon].

Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13