Delta Live Tables pipeline running time

Asked Jun 20 '22 at 15:19

Active Jul 22 '22 at 18:00

Viewed 407 times

New to Databricks Delta Live Tables. Set up my first pipeline to ingest a single 26Mb CSV file from an Azure blob using the following code:

import dlt
@dlt.table(
  comment="this is a test"
)
def accounts():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("/mnt/mntname/")
  )

It's been running for 24 minutes on product edition advanced with a spark cluster config of runtime 10.4, 42GB active memory, 12 cores and 2.25 active DBU/hour.

Is this normal, it seems very slow for such a small workload?

edited Jul 22 '22 at 18:00

Alex Ott

80,552
8
87
132

asked Jun 20 '22 at 15:19

Valkyrja.Kara

any specific reason you are reading it as Stream when its a single file? can you please try with spark.read. instead of readStream ? – Ganesh Chandrasekaran Jun 20 '22 at 17:44
have you set `continuos` parameter to `true`: https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-concepts.html#continuous-and-triggered-pipelines. If yes, then it will wait for new files indifinitely – Alex Ott Jun 21 '22 at 13:07

Delta Live Tables pipeline running time

0 Answers0