The data size for the data is in the Terabytes.
I have multiple Databricks notebooks for incremental data load into Google BigQuery for each dimension table.
Now, I have to perform this data load every two hours i.e. run these notebooks.
What is a better approach among the following:
Create a master Databricks notebook and use dbutils to chain/parallelize the execution of the aforementioned Databricks notebooks.
Use Google Composer (Apache Airflow's Databricks Operator) to create a master DAG to orchestrate these notebooks remotely.
I want to know which is better approach when I have use cases for both parallel execution and sequential execution of said notebooks.
I'd be extremely grateful if I could get a suggestion or opinion on this topic, thank you.