New to Databricks Delta Live Tables. Set up my first pipeline to ingest a single 26Mb CSV file from an Azure blob using the following code:
import dlt
@dlt.table(
comment="this is a test"
)
def accounts():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.load("/mnt/mntname/")
)
It's been running for 24 minutes on product edition advanced with a spark cluster config of runtime 10.4, 42GB active memory, 12 cores and 2.25 active DBU/hour.
Is this normal, it seems very slow for such a small workload?