0

I have a delta file (consisting of meta data and fragmented parquet files) that I save with databricks to Azure Blob Storage. Later, I am trying to read that file with Azure Data Factory Pipeline but when using copy activity it reads all the data in that delta instead of the latest version (as specified by meta data).

How do I just read one version from delta file on a blob storage?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
euh
  • 319
  • 2
  • 11

1 Answers1

1

You can use dataflow for getting data for specific version.

create a new dataflow activity.

enter image description here

select Inline as source type and Delta in inline dataset type.

enter image description here

Next go to Source options tab. enter image description here

Here, select your delta Folder path, Time travel as Query by version, then give your version.

This gives you the result. And use this dataflow in your pipeline.

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
  • I am trying this out and cannot figure out "Time travel Option" How do I make this value dynamic to select the latest version number always ? EDIT: without having manually write the number – euh May 26 '23 at 08:19
  • 1
    disable `Time travel` default it will take latest version. – JayashankarGS May 26 '23 at 09:06
  • Thanks it did work, but I was supprised it was actually faster to migrate my huge delta file thru databriks pyspark then ADF dataflow! – euh May 26 '23 at 11:38