I'm trying to parse incoming variable length stream records in databricks using Delta Live Tables. I'm getting the error:
Queries with streaming sources must be executed with writeStream.start();
Notebook code
@dlt.table (
comment="xAudit Parsed"
)
def b_table_parsed():
df = dlt.readStream("dlt_table_raw_view")
for i in range(df.select(F.max(F.size('split_col'))).collect()[0][0]):
df = df.withColumn("col"+str(i),df["split_col"][i])
df = (df
.drop("value","split_col")
)
return df
This all works fine against the actual source text files or a delta table using the interactive cluster but when I put it in DLT and and the source is streaming files from autoloader, it doesn't like it. I assume it's stream related.
I saw a different post about using .foreach maybe but that was using writeStream and not sure if I can or how to use it to return in a DLT table, or if there is another solution.
I'm very new to python, streaming and DLT so would appreciate if anyone can walk me through a detailed solution.
Trying to parse out variable length rows in a streaming source using a delta live table notebook in databricks. Works on the interactive cluster but not streaming in DLT