0

I would like to setup ADF pipeline in such a way that I need to load all the Parquet files hosted for 2+ years on ADLS Gen2 with a hierarchy of Year -> Month -> Day -> Hour - > Min. Over the period, we did have some file structure changes with a variance of 2-3 columns. I would like to pull all the common columns and load entire data in a sql table. Can someone please point me to the resources which could help with my requirement.

Thank you!

jarlh
  • 42,561
  • 8
  • 45
  • 63

1 Answers1

0

In the Azure data factory pipeline,

  1. Use the Get Metadata activity to get the list of parquet files.
  2. Pass the child items to the ForEach activity to loop each current item.
  3. Add the If condition activity inside ForEach activity to check if the date from the file is greater than the current time minus 2.
  4. Add a copy data activity in True activities to copy data from source to sink.

You can refer to this document to copy data to the SQL table.

NiharikaMoola-MT
  • 4,700
  • 1
  • 3
  • 15