3

Using Python and all the relevant DLT properties within Databricks, does anyone know how to simple append to a DLT table from a batch source?

In PySpark you can just use df.write.format("delta").mode("append") but since dlt requires you to return a Spark dataframe within its decorator, we can't use the Dataframe Writer API.

Thanks in advance.

Luke88
  • 33
  • 4

1 Answers1

1

Delta Live Tables has a notion of a streaming live table that is append-only by default. You can define your pipeline as triggered, that will be equivalent of the the Trigger.Once. Something like that:

@dlt.table
def append_only():
  return spark.readStream.format("xyz").load()

Here we use readStream just to make sure that when we run the pipeline again we won't append the same content again & again.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks a lot Alex! Appreciate the clarification, cheers. – Luke88 Aug 01 '22 at 11:14
  • Hi Alex, hope you don't mind this follow up: For this DLT streaming table using spark.readStream, is it possible to pass in an array of sources that have the same schema but found in different timestamp folders? I tested the following pseudo code but it didn't work so was wondering if you have any tips please? Thanks in advance. `code` @dlt.table( name=silver_table, path=f"abfss://{TargetContainer}@{StorageAccountName}.dfs.core.windows.net/{TargetDirectory}" ) def silver_table(): for each_row in array: df = spark.readStream………(each_row) return df `code` – Luke88 Aug 05 '22 at 06:06
  • you can use glob pattern to specify multiple folders under the same location... Otherwise it's either readStream for each folder & union with others (it may break if you add or remove sources), or use this recipe to programmatically create individual nodes in the DLT graph for each of locations; https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-cookbook.html#programmatically-manage-and-create-multiple-live-tables – Alex Ott Aug 05 '22 at 06:13
  • Thanks once again Alex; I will stick to my current glob pattern then. Cheers. – Luke88 Aug 05 '22 at 10:12
  • Within this solution, what happens if what we appended to the target table in the first run is not available in the source at the second run. Will this create issues? – Moein May 05 '23 at 08:58
  • it depends on the source... – Alex Ott May 05 '23 at 09:14