I'm working on a dag that get file from sftp then attach the file in the email using airflow.
The process:
- Copy file from sftp to
./my_path/my_file.txt
locally. - Use the path to find the file and attach in the email.
After further investigation, I need to make sure that both tasks to run with the same worker since I want to save the file locally rather than in GCS.
Hence, I used SubDagOperator
for this. Based on this answer.
The code is running fine until I added EmailOperator
. This send_email
task failed because it cannot find the path of the file.
import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.sftp.operators.sftp import SFTPOperator
from airflow.providers.google.cloud.transfers.sftp_to_gcs import SFTPToGCSOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.operators.email import EmailOperator
default_args = {
"owner": "my_email@example.com",
"start_date": datetime(2021, 11, 1),
"retries": 1,
"retry_delay": timedelta(seconds=60),
"depends_on_past": False,
"email": ["my_email@example.com],
"email_on_failure": True,
"email_on_retry": False,
"catchup": False
}
PARENT_DAG_NAME = "parent_dag_v3"
CHILD_DAG_NAME = "run_child_dag"
dag = DAG(
PARENT_DAG_NAME,
default_args=default_args,
description="My awesome pipeline",
schedule_interval="0 7 * * *"
)
def child_dags(parent_dag_name, child_dag_name, args):
child_dag = DAG(
"{0}.{1}".format(parent_dag_name, child_dag_name),
default_args=args,
description="My child DAG",
schedule_interval=None
)
copy_file = SFTPOperator(
task_id="download_my_files",
ssh_conn_id="=my_conn",
local_filepath="./my_path/my_file.txt",
remote_filepath="/IN/my_file.txt",
operation="get",
create_intermediate_dirs=True,
dag=child_dag
)
send_email = EmailOperator(
to=["user_email@example.com"],
task_id="send_email",
subject="Testing onlyyy",
html_content="blahblah falafel wooooo",
files=["./my_path/my_file.txt"], # STUCK HERE
dag=child_dag)
copy_file >> send_email
return child_dag
run_child_dag = SubDagOperator(
task_id = CHILD_DAG_NAME,
subdag=child_dags(parent_dag_name = PARENT_DAG_NAME,
child_dag_name = CHILD_DAG_NAME, # Must be the same as task ID
args = dag.default_args),
dag=dag
)
start = DummyOperator(task_id="start", dag=dag)
start >> run_child_dag
Main question is, how do I find the right path to be used in EmailOperator
?
If there is any better way of doing this using GCS
for example, I'm open for suggestions.