0

I'm working on a dag that get file from sftp then attach the file in the email using airflow.

The process:

  1. Copy file from sftp to ./my_path/my_file.txt locally.
  2. Use the path to find the file and attach in the email.

After further investigation, I need to make sure that both tasks to run with the same worker since I want to save the file locally rather than in GCS.

Hence, I used SubDagOperator for this. Based on this answer.

The code is running fine until I added EmailOperator. This send_email task failed because it cannot find the path of the file.

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.sftp.operators.sftp import SFTPOperator
from airflow.providers.google.cloud.transfers.sftp_to_gcs import SFTPToGCSOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.operators.email import EmailOperator


default_args = {
    "owner": "my_email@example.com",
    "start_date": datetime(2021, 11, 1),
    "retries": 1,
    "retry_delay": timedelta(seconds=60),
    "depends_on_past": False,
    "email": ["my_email@example.com],
    "email_on_failure": True,
    "email_on_retry": False,
    "catchup": False
}

PARENT_DAG_NAME = "parent_dag_v3"
CHILD_DAG_NAME = "run_child_dag"

dag = DAG(
    PARENT_DAG_NAME,
    default_args=default_args,
    description="My awesome pipeline",
    schedule_interval="0 7 * * *"
)


def child_dags(parent_dag_name, child_dag_name, args):
    child_dag = DAG(
        "{0}.{1}".format(parent_dag_name, child_dag_name),
        default_args=args,
        description="My child DAG",
        schedule_interval=None
        )

    copy_file = SFTPOperator(
        task_id="download_my_files",
        ssh_conn_id="=my_conn",
        local_filepath="./my_path/my_file.txt",
        remote_filepath="/IN/my_file.txt",
        operation="get",
        create_intermediate_dirs=True,
        dag=child_dag
    )


    send_email = EmailOperator(
    to=["user_email@example.com"],
    task_id="send_email",
    subject="Testing onlyyy",
    html_content="blahblah falafel wooooo",
    files=["./my_path/my_file.txt"], # STUCK HERE
    dag=child_dag)

    copy_file >> send_email

    return child_dag


run_child_dag = SubDagOperator(
    task_id = CHILD_DAG_NAME,
    subdag=child_dags(parent_dag_name = PARENT_DAG_NAME,
        child_dag_name = CHILD_DAG_NAME,  # Must be the same as task ID
        args = dag.default_args),
    dag=dag
)


start = DummyOperator(task_id="start", dag=dag)

start >> run_child_dag

Main question is, how do I find the right path to be used in EmailOperator?

If there is any better way of doing this using GCS for example, I'm open for suggestions.

user6308605
  • 693
  • 8
  • 26
  • What `Executor` do you use ? If `Kubernetes-`, the downloaded file will only be `local` and accessible until task is complete. Read about `volumes` to share files between tasks. If `Local-`, confirm it got effectively saved with `bash cat()` – yan-hic Nov 04 '21 at 15:19
  • It is `Kubernetes`. In this case I should `get` the file and mount to volume right? Can this be done directly? Or do I need to copy the local to volume first? – user6308605 Nov 05 '21 at 01:28
  • Read [here](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html) on how to create a yaml template. The volume will be mounted automatically when pod is created hence available to all tasks. If file is not too big, you could zip and pass on through xcom – yan-hic Nov 05 '21 at 16:41

0 Answers0