0

I am new to airflow and am trying to get everything set up.

I am currently working through a tutorial that gives a basic overview of DAGS and airflow functionality. I have airflow set up on a linux server and have my tutorial DAG populating in the user interface. When I test it in the interface though, i keep getting an error stating that my DAG has failed. When i look at the log to see why it failed, it doesn't populate with an error message. Here is the Code from the tutorial that I used.

from datetime import timedelta
import airflow 
from airflow import DAG 
from airflow.operators.bash import BashOperator
import pendulum

default_args = {
    'owner' : 'curtis',
    'start_date' : pendulum.today('UTC').add(days= -2),
    'depends_on_past' : False, #if the previous job fails, then this one will not execute if set to true. If set to false it will run
    'email' :  [''], 
    'email_on_failure' : True, 
    'email_on_retry' : True, 
    'retried' : 1, 
    'retry_delay' : timedelta(minutes = 5)  
}

dag = DAG(
    'bigquery_test', 
    default_args = default_args, 
    description = 'test for bigquery connection', 
    #schedule once per day
    schedule = '@daily'
    # @once schedule once and only once
    # @hourly run once an hour at hte beginning of hte hour
    # @daily run once a day at midnight
    # @weekly run once a week at midnight on sunday morning
    # @ monthly run once a montnh at midnight of the first day of the month
    # @yearly run once a year at midnight of january 1st
)

#t1, t2 and t3 are exmaples of tasks created by instantiating operators 
t1 = BashOperator(
    task_id = 'print_date', 
    bash_command = 'date', 
    dag = dag 
)

t1.doc_md = """\
#### Task Documentation
You can document your task using the attributes `doc_md` (markdown),
`doc` (plain text), `doc_rst`, `doc_json`, `doc_yaml` which gets
rendered in the UI's Task Instance Details page.
![img](http://montcs.bloomu.edu/~bobmon/Semesters/2012-01/491/import%20soul.png)
"""

dag.doc_md = __doc__

t2 = BashOperator(
    task_id = 'sleep', 
    depends_on_past = False, 
    bash_command = 'sleep 5', 
    dag = dag
)

templated_command = """
{% for i in range(5) %}
    echo "{{ ds }} "
    echo "{{ macros.ds.add(ds,7)}}"
    echo "{{ params.my_param }}"
{% endfor %}
"""

t3 = BashOperator(
    task_id = 'templated', 
    depends_on_past = False, 
    bash_command = templated_command, 
    params = {'my_param': 'parameter i passed in'},
    dag = dag
)

#setting up the dependencies
#this means that t2 will depend on t1
#running successfully to run
#t1.set_downstream(t2)

#similar to above where t2 will depend on t1
#t3.set_upstream(t1)

#the bit shift operator can also be used to chain operations: 
#t1>>t2

#and the upsteram dependency with the 
#bit shift operator
#t2<<t1

#a list of tasks can also be set as
#dependencies. these operations
#all ahve the same effect: 

#t1.set_downstream([t2,t3])
t1 >> [t2,t3]
#[t2,t3] << t1

1 Answers1

0

I just ran your code and it worked, the t3 task failed though. enter image description here

jinja2.exceptions.UndefinedError: 'module object' has no attribute 'ds'.

But the DAG itself looks good to be able to run. I'd say it's something to do with your python runtime or how you set up the instance. I'd recommend you to follow the official Getting Started guide to configure the project or use the Astro CLI which is even easier to use. Also here you can set it up using Docker Compose

Also I suppose the tutorial begins teaching the most basic way of setting up a DAG, and that's okay. Just FYI - there are clearer and more modern ways of doing it, for example using the Taskflow API (:

ign_n
  • 31
  • 4