I am developing a Databricks Pipeline, writing my DLTs in Python. I want to understand how to control the Pipeline's order of creation of DLTs.
Currently, the Pipeline attempts to create every single DLT in the order that they're written in, resulting in error if some data is not available. Let me clarify:
@dlt.table
def table1():
return spark.sql("SELECT a,b,c FROM table_A")
@dlt.table
def table2():
return spark.sql("SELECT x,y,z FROM table_B")
@dlt.table
def table3():
res1 = dlt.read("table1")
res2 = dlt.read("table2")
if "a" in res1.schema.names and "x" in res2.schema.names:
return ...
elif "a" in res1.schema.names
return ...
elif "x" in res2.schema.names
return ...
else return ...
I want the Pipeline to just create table3, checking the if else conditions on whether the data from other sources will be available first before creating DLTs table1 or/and table2.
Is this possible or am I misunderstanding something about how Pipelines are supposed to work? You can assume that the data will be present at some moment, but there is a possibility that is has not been yet loaded into the database.