I'm using Airflow, Databricks, and PySpark. I would like to know if it is possible to add more parameters when I want to execute a Databricks Notebook through Airflow.
I had the next code in Python named MyETL:
def main(**kwargs):
spark.sql("CREATE TABLE {0} {1}".format(table, columns))
print("Running my ETL!")
if __name__== "__main__":
main(arg1, arg2)
I want to define others task params that run a Databricks notebook with more params, I wanna add the name of the method, and the parameters of these methods. For example when I want to register tasks in a DAG in Airflow:
notebook_task_params = {
'new_cluster': new_cluster,
'notebook_task': {
'notebook_path': '/Users/airflow@example.com/MyETL',
'method_name': 'main',
'params':'[{'table':'A'},{'columns':['a', 'b']}]'
},
}
I don't know if that is possible because I didn't find similar examples.
# Example of using the JSON parameter to initialize the operator.
notebook_task = DatabricksSubmitRunOperator(
task_id='notebook_task',
dag=dag,
json=notebook_task_params)
In other words, I want to execute a notebook with parameters using Airflow. My question is how can I do that?