3

I'm trying to find a way to generalize passing parameters of a dag when triggering it from the UI itself,I know I should pass it as a key/value pair format but I don't know how to parse those parameters in the dag script itself

for example if I pass the conf pair {"dir_of_project":"root/home/project"}

what should i do inside the dag script so that I have inside it a variable called dir_of_project that equals the path above so that I can use it further in my code. I tried to do the following inside the script but obviously i'm doing it wrong :

dir of project = "{{ dag_run.conf['file_name'] }}"

for example let's say i did this:

def func(**kwargs):
    file_name = kwargs['dag_run'].conf.get('dir_of_project')
    return file_name
op = PythonOperator(
    task_id='task',
    python_callable=func,
    dag=dag,
    provide_context=True
)


sampling_out_parent_dir="/root/home/myfolder/sampled-vids"
file_path=os.path.join(sampling_out_parent_dir,file_name)

this would result in an undef. variable error as it's defined only in the scope of the fn and the operator and i can't used it outside like this, so how can i make it work?

omar
  • 41
  • 1
  • 4

2 Answers2

1

I believe your issue is because you are using Jinja somewhere that isn't being templated.

For input of {"dir_of_project":"root/home/project"} when you manually trigger DAG in the UI or executing with CLI:

airflow trigger_dag your_dag_id --conf '{"dir_of_project":"root/home/project"}'

you can extract with:

{{ dag_run.conf['dir_of_project'] }}

or if you use PythonOperator with:

kwargs['dag_run'].conf['dir_of_project']

Example:

def func(**kwargs):
    file_name = kwargs['dag_run'].conf.get('dir_of_project')
    # Rest of the code

op = PythonOperator(
    task_id='task',
    python_callable=func,
    dag=dag,
    provide_context=True # Remove this if you are using Airflow >=2.0
)
Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • ok this worked indeed and the parameters is passed to file_name but my goal is to be able to use that variable further in my code (outside the fn and the operator itself) so is that by any means possible? – omar Feb 04 '21 at 17:26
  • please check the post above again, I edited it to clarify what i mean – omar Feb 04 '21 at 17:54
  • @omar please explain why you want to do this – Elad Kalif Feb 04 '21 at 17:57
  • I have a directory of directories each one contains some files for further processing, and instead of having to change the sub directory each time in the script and upload it again to the server and run the docker to apply the change, I wanna pass that sub-dir as a param when triggering the dag thrrough airflow UI. similarly it's not just that sub-dir, it's multiple variables in my script that I'd like to handle that way – omar Feb 04 '21 at 18:25
  • i suggest you open a new question about it. This is a "how to design my ETL" question. My recommendation for you is split it: first DAG receive user input, process it & store it to shared disc when finished trigger a 2nd DAG which reads the file from disc and start processing it. – Elad Kalif Feb 04 '21 at 18:33
  • Oh okay I get your point, I'll try what you suggested and update if it worked. thank you for your help much appreciated. – omar Feb 04 '21 at 18:40
1

I wanted to updated this as I solved the issue a while back but forgot to, after much research I found out that's actually impossible to extract variables with their values from airflow's operator and use them outside of it, however I found a workaround this by using template fields, airflow uses something called Xcom variables which variables that can be pushed and pulled from and to operators( basically a comm. system between them), it pretty much fets the job done if you want to extract the variable from one operator to another. further more if you want to build new normal variables depending on an Xcom one, you can do that by using the Xcom one as a template filed and even build further ones, as long as you'll eventually use those normal ones inside an operator which then will be able to render that template. for example: if I push an Xcom variable in one of my tasks operator and that variable has an integer value = 100, i can get the variable by using this: newvar="{{ task_instance.xcom_pull(task_ids='%s') }}"%task_id where task_id is the name of the task you pushed the variable in, further more i can use newvar outside the operator, it will carry around that value as a string until again passed to an operator which will render the template field and replace it with the original value 100. for more info refer to airflow Xcom documentation :here

omar
  • 41
  • 1
  • 4