1

New to Databricks Delta Live Tables. Set up my first pipeline to ingest a single 26Mb CSV file from an Azure blob using the following code:

import dlt
@dlt.table(
  comment="this is a test"
)
def accounts():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("/mnt/mntname/")
  )

It's been running for 24 minutes on product edition advanced with a spark cluster config of runtime 10.4, 42GB active memory, 12 cores and 2.25 active DBU/hour.

Is this normal, it seems very slow for such a small workload?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • any specific reason you are reading it as Stream when its a single file? can you please try with spark.read. instead of readStream ? – Ganesh Chandrasekaran Jun 20 '22 at 17:44
  • have you set `continuos` parameter to `true`: https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-concepts.html#continuous-and-triggered-pipelines. If yes, then it will wait for new files indifinitely – Alex Ott Jun 21 '22 at 13:07

0 Answers0