0

New to DLT, struggling with the python syntax for returning a dataframe via the dlt.read_stream operator as a union (unionByName) of two other live tables.

My pipeline is as follows..

WORKS: Table1:

@dlt.table()
def table_1()
   return spark.sql (''' select mergeKey, date_seq, colN, case/when.., cast.. from live.raw_table_1 ''')

WORKS: Table2:

@dlt.table()
def table_2()
   return spark.sql (''' select mergeKey, date_seq, colN, case/when.., cast.. from live.raw_table_2 ''')

PROBLEM: View_Union:

    @dlt.view()
    def pre_merge_union_v
    --> question: what's the syntax to [return dlt.read_stream("df_unioned").as.table1.unionByName(table2)

Create Silver Table:

dlt.create_streaming_live_table("silver_table")

Finally, Apply Changes Into Silver:

dlt.apply_changes(
   target = "silver_table"),
   source = "pre_merge_union_v"
   keys = ["mergeKey"],
   sequence_by = "date_seq"

TRIED:

I tried to create my View_Union view as:

@dlt.view()
def pre_merge_union_v:
   df_table1 = spark.table("live.raw_table1")
   df_table2 = spark.table("live.raw_table2")
   df_unioned = df_table1.unionByName(df_table2)
     return df_unioned

However, when the pipeline runs, it complains that pre_merge_union_v must be a READ_STREAM VIEW (as it's merging into a streaming live table, I assume)

Using return dlt.read_stream(df_unioned) also generates an error.

ExoV1
  • 97
  • 1
  • 7

1 Answers1

0

I believe I've answered my own question. Hope it helps.

return dlt.read_stream("raw_table1").unionByName(dlt.read_stream("raw_table2"))
ExoV1
  • 97
  • 1
  • 7