I am new to airflow so apoliges if this has been asked somewhere.
I have a query i run in hive that is partitioned on year month so e.g. 202001.
how can i run a query which specifies a variable for different values within the query in airflow? eg. taking this example
from airflow import DAG
from airflow.operators.mysql_operator import MySqlOperator
default_arg = {'owner': 'airflow', 'start_date': '2020-02-28'}
dag = DAG('simple-mysql-dag',
default_args=default_arg,
schedule_interval='00 11 2 * *')
mysql_task = MySqlOperator(dag=dag,
mysql_conn_id='mysql_default',
task_id='mysql_task'
sql='<path>/sample_sql.sql',
params={'test_user_id': -99})
where my sample_sql.hql looks like:
ALTER TABLE sample_df DROP IF EXISTS
PARTITION (
cpd_ym = ${ym}
) PURGE;
INSERT INTO sample_df
PARTITION (
cpd_ym = ${ym}
)
SELECT
*
from sourcedf
;
ANALYZE TABLE sample_df
PARTITION (
cpd_ym = ${ym}
)
COMPUTE STATISTICS;
ANALYZE TABLE sample_df
PARTITION (
cpd_ym = ${ym}
)
COMPUTE STATISTICS FOR COLUMNS;
i want to run the above for different values of ym using airflow e.g. between 202001 and 202110 how can i do this?