Load Parquet Files from ADLS Gen2 using ADF

Question

I would like to setup ADF pipeline in such a way that I need to load all the Parquet files hosted for 2+ years on ADLS Gen2 with a hierarchy of Year -> Month -> Day -> Hour - > Min. Over the period, we did have some file structure changes with a variance of 2-3 columns. I would like to pull all the common columns and load entire data in a sql table. Can someone please point me to the resources which could help with my requirement.

Thank you!

score 0 · Answer 1 · answered Jul 01 '22 at 11:16

0

In the Azure data factory pipeline,

Use the Get Metadata activity to get the list of parquet files.
Pass the child items to the ForEach activity to loop each current item.
Add the If condition activity inside ForEach activity to check if the date from the file is greater than the current time minus 2.
Add a copy data activity in True activities to copy data from source to sink.

You can refer to this document to copy data to the SQL table.

answered Jul 01 '22 at 11:16

NiharikaMoola-MT

4,700
1
3
15

thank you...will try this solution and share the outcome. – sp_analytics Jul 02 '22 at 19:02
I am unable to get the child items recursively using 'Get Metadata'...any alternatives that I can explore...thank you! – sp_analytics Jul 09 '22 at 22:42

Load Parquet Files from ADLS Gen2 using ADF

1 Answers1