0

I'm interested in what is the difference between Structured Streaming and Delta Live Tables. Databricks said For most streaming or incremental data processing or ETL tasks, Databricks recommends Delta Live Tables.

Does it mean I should always stick to DLT, and Structured Streaming is an old feature?

Zac
  • 598
  • 6
  • 11

1 Answers1

1

TL;DR - DLT = SaaS Structured Streaming, makes streaming simple to implement at a cost ($$).


DLT

  • provides DSL to let you write your streaming code with fewer lines of code. A simple example (though DLT offers a lot more). E.g. using structured streaming to stream from json files at /path/to/json/file/streams/taxi_raw to a delta table at /path/to/delta/tables/filtered_data:
df_taxi_raw = spark.readStream.json('/databricks-datasets/nyctaxi/sample/json/')
df_taxi_raw.writeStream.format('delta').start('/path/to/delta/tables/taxi_raw')

df_filtered_data = spark.readStream.format("delta").load("/path/to/delta/tables/taxi_raw").where(...)
df_filtered_data.writeStream.format('delta').start('/path/to/delta/tables/filtered_data')

Same thing using DLT:

import dlt

@dlt.view
def taxi_raw():
  return spark.read.format("json").load("/path/to/json/file/streams/taxi_raw")

@dlt.table(name="filtered_data")
def create_filtered_data():
  return dlt.read("taxi_raw").where(...)
  • It's an additional cost.
  • [Opinion] It's pretty new and we didn't go for it as we have been bled by "bleeding edge features" before. YMMV.

... Databricks recommends Delta Live Tables.

Does it mean I should always stick to DLT, and Structured Streaming is an old feature?

"Databricks recommends" because they're in business of making money, not because DLT is the "new feature" replacing an older one. It's more like Walmart recommending "Walmart+" though it's not necessary to shop at Walmart.

E.g. RDD is replaced by DataFrame, and in future new features would be added to DataFrame not RDD. But that's not the case with DLT and Structured Streaming. Structured Streaming is developed by Apache and will continue.

Understand the cost and benefits and then decide. You can do streaming using either DLT or stock Spark Structured Streaming.

Kashyap
  • 15,354
  • 13
  • 64
  • 103