Inside the Data Lake, We have a folder that basically contains the files pushed by external source every day. However, we wanted to only process the latest added file in that folder. Is there any way to achieve that with Azure Data Factory?
2 Answers
You could set modifiedDatetimeStart and modifiedDatetimeEnd to filter the files in the folder when you use ADLS connector in copy activity.
Maybe it has two situations:
1.The data was pushed by external source in the schedule,you are suppose to know the schedule time to configure.
2.The frequency is random,then maybe you have to log the pushing data time in another residence,then pass the time as parameter into copy activity pipeline before you execute it.
I try to provide a flow for you in ADF pipelines as below:
My sample files in same folder:
Step1,create two variables, maxtime and filename:
maxtime is the critical datetime of specific date, filename is empty string.
Step2, use GetMetadata Activity and ForEach Activity to get the files under folder.
GetMetadata 1 configuration:
ForEach Activity configuration:
Step3: Inside ForEach Activity,use GetMetadata and If-Condition, the structure as below:
GetMetadata 2 configuration:
If-Condition Activity configuration:
Step4: Inside If-Condition True branch,use Set Variable Activity:
Set variable1 configuration:
Set variable2 configuration:
All of above steps aim to finding the latest fileName, the variable fileName is exactly target.
Addition for another new dataset in GetMetadata 2

- 23,163
- 2
- 27
- 32
-
It's actually random. Could you elaborate more on "pushing data time in another residence"? I am currently looking at get metadata activity but it seems that it cannot read the latest modified file in a folder – OreoFanatics Mar 06 '20 at 06:47
-
@OreoFanatics NoNo... i didn't mean that using `GetMetadata Activity`. I mean if you manipulate the process of pushing data into ADLS,then you definitely know the pushing time even though it's random, then you could persist the time by yourself so that you could use it as parameter in the ADF. – Jay Gong Mar 06 '20 at 06:53
-
@OreoFanatics If you actually can't manipulate the process,the data comes all the time.Then i think you can control the ADF pipeline trigger time at least.Log that time somewhere! So that you could use it as parameter next time! Hope i'm clear on this! – Jay Gong Mar 06 '20 at 06:55
-
Yes data comes all the time. And no matter the quantity of the data received that day, we only want to pick the latest one whenever the pipeline is triggered. – OreoFanatics Mar 06 '20 at 07:00
-
@OreoFanatics ok,so based on my understanding, you wanna get the latest one file during one day,correct? – Jay Gong Mar 06 '20 at 07:02
-
Yes, the folder structure is based on each day. So it is indeed to get the latest file from that day (folder). – OreoFanatics Mar 06 '20 at 07:08
-
@OreoFanatics Please see my updates,i tested it successfully. – Jay Gong Mar 06 '20 at 08:20
-
what should be the dataset for 'Get Metadata1' and 'Get Metadata2'? I currently put the first one with the path to the folder which contains multiple files. However I am not sure what I should put for 'Get Metadata2' – OreoFanatics Mar 06 '20 at 09:35
-
@OreoFanatics Sorry i missed that , already added into my answer – Jay Gong Mar 06 '20 at 09:37
-
it seems that your solution is leading to the correct solution. I need to test it. Let's say I want it for copy data with the 'filtered' file, should I create a new copy data activity and connect it to ForEach1? – OreoFanatics Mar 06 '20 at 10:25
-
@OreoFanatics Yes, above steps just find the latest file during a specific day. Not including copy step. – Jay Gong Mar 06 '20 at 15:40
-
I tried with multiple files in a folder and it seems that it pulls a random file instead of the latest modified file. I think something is still not right in the configuration – OreoFanatics Mar 11 '20 at 09:51
-
@OreoFanatics So,based on my test,it works well.Have you checked the modify date of your files? – Jay Gong Mar 12 '20 at 05:50
-
ok I think it is because I put the copy data inside the IF activities. That's why it's replacing one after another. I think I should put copy data activity outside IF activity, correct? – OreoFanatics Mar 12 '20 at 07:42
-
1@OreoFanatics Yeah, my process ends with the result of the latest file name, i will test next step ,as copy step for you. – Jay Gong Mar 12 '20 at 07:49
-
nevermind. I figured it out. Thank you for your nice example, though – OreoFanatics Mar 12 '20 at 10:17
-
@OreoFanatics Glad to assist you. – Jay Gong Mar 12 '20 at 15:34
You can make use of the Modified datetime start and Modified datetime end fields as per shown in below screenshot.
The example here shows get files from 24 hours from current datetime.

- 755
- 2
- 10
- 23
-
Well the problem is: we only want to have the last file that was added into the folder, not all files within a specific timeframe. If within 1 hour there are 100 or 1000 files added it does not matter because we always want to pick only the latest one – OreoFanatics Mar 06 '20 at 06:51