lately I have been playing around with Airflow and PySpark. I saw that Airflow has a number of variables. My aim is to parse one of those variables and import it to my pySpark script. So far I tried to echo the value of the variable (worked) but then, I couldn't find a way to import in to pySpark(I want to pass the value of that variable to another variable in my pyspark script). I also attach my code( job_id
is the variable I am talking about).
test_bash = """
export un_id={{ti.job_id}}
echo $un_id
"""
bash_task = BashOperator(
task_id='test',
bash_command=test_bash,
xcom_push=True,
provide_context=True,
dag=dag)
def pull_function(**kwargs):
ti = kwargs['ti']
rt = ti.xcom_pull(task_ids='test')
print(rt)
pull_task = PythonOperator(
task_id='pull_task',
python_callable=pull_function,
provide_context=True,
dag=dag
)
#############
bash_task >> pull_task
Any idea how should I carry on or if I am doing something wrong?