New to DLT, struggling with the python syntax for returning a dataframe via the dlt.read_stream operator as a union (unionByName) of two other live tables.
My pipeline is as follows..
WORKS: Table1:
@dlt.table()
def table_1()
return spark.sql (''' select mergeKey, date_seq, colN, case/when.., cast.. from live.raw_table_1 ''')
WORKS: Table2:
@dlt.table()
def table_2()
return spark.sql (''' select mergeKey, date_seq, colN, case/when.., cast.. from live.raw_table_2 ''')
PROBLEM: View_Union:
@dlt.view()
def pre_merge_union_v
--> question: what's the syntax to [return dlt.read_stream("df_unioned").as.table1.unionByName(table2)
Create Silver Table:
dlt.create_streaming_live_table("silver_table")
Finally, Apply Changes Into Silver:
dlt.apply_changes(
target = "silver_table"),
source = "pre_merge_union_v"
keys = ["mergeKey"],
sequence_by = "date_seq"
TRIED:
I tried to create my View_Union view as:
@dlt.view()
def pre_merge_union_v:
df_table1 = spark.table("live.raw_table1")
df_table2 = spark.table("live.raw_table2")
df_unioned = df_table1.unionByName(df_table2)
return df_unioned
However, when the pipeline runs, it complains that pre_merge_union_v must be a READ_STREAM VIEW (as it's merging into a streaming live table, I assume)
Using return dlt.read_stream(df_unioned)
also generates an error.