5

I need to process files in the container using Azure Datafactory and keep a track of processed files in the next execution.

so I am keeping a table in DB which stores the processed file information, In ADF I am getting the FileNames of the processed files and I want to check whether the current file has been processed or not.

I am Using Lookup activity: Get All Files Processed to get the processed files from DB by using below query:

select FileName from meta.Processed_Files;

Then I am traversing over the directory, and getting File Details for current File in the directory by using Get Metadata Activity: "Get Detail of Current File in Iteration"

and in the If Condition activity, I am using following Expression:

@not(contains(activity('Get All Files Processed').output.value,activity('Get Detail of current file in iteration').output.itemName))

This is always returning True even if the file has been processed

How do we compare the FileName from the returned value

Output of activity('Get All Files Processed').output.value

{
"count": 37,
"value": [
    {
        "FileName": "20210804074153AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210805074129AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210806074152AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210809074143AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210809074316AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210810074135AlteryxRunStats.xlsx"
    },
    {
        "FileName": "20210811074306AlteryxRunStats.xlsx"
    },

Output of activity('Get Detail of current file in iteration').output.itemName

"20210804074153AlteryxRunStats.xlsx"
Saurabh Mehta
  • 91
  • 1
  • 2
  • 9
  • Also when I am using Expression: `@not(contains(activity('Get All Files Processed').output.value.FileName,activity('Get Detail of current file in iteration').output.itemName))` The If Condition activity is failing with error: ` The expression 'not(contains(activity('Get All Files Processed').output.value.FileName,activity('Get Detail of current file in iteration').output.itemName))' cannot be evaluated because property 'FileName' cannot be selected. Array elements can only be selected using an integer index.` – Saurabh Mehta Sep 19 '21 at 16:06
  • I had the exact same question as you and am doing pretty much the same thing with the table containing files to be processed and doing a getMetaData on the files. I used the solution from @wBob `@contains(activity('Lookup FileProcessingTable').output.value, json(concat('{"FileName":"',activity('GetFileMetadata').output.itemName,'"}')))` – rigoo44 Oct 28 '21 at 22:44

2 Answers2

4

I often pass this type of thing off to SQL in Azure Data Factory (ADF) too, especially if I've got one in the architecture. However bearing in mind that any hand-offs in ADF take time, it is possible to check if an item exists in an array using contains, eg a set of files returned from a Lookup.

Background

Ordinary arrays normally look like this: [1,2,3] or ["a","b","c"], but if you think about values that get returned in ADF, eg from Lookups, they they look more like this:

{
    "count": 3,
    "value": [
        {
            "Filename": "file1.txt"
        },
        {
            "Filename": "file2.txt"
        },
        {
            "Filename": "file3.txt"
        }
    ],
    "effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (North Europe)",
    "billingReference": {
        "activityType": "PipelineActivity",
        "billableDuration": [
            {
...

So what you've got is a complex piece of JSON representing an object (the return value of the Lookup activity plus some additional useful info about the execution), and the array we are interested in is within the value object. However it has additional curly brackets, ie it is itself an object.

Solution

So the thing to do is to pass to contains something that will look like your object which has the single attribute Filename. Use concat to create the string and json to make it authentic:

@contains(activity('Lookup').output.value, json(concat('{"Filename":"',pipeline().parameters.pFileToCheck,'"}')))

Here I'm using a parameter which holds the filename to check but this could also be a variable or output from another Lookup activity.

Sample output from Lookup:

Lookup output

The Set Variable expression using contains:

contains expression

The result assigned to a variable of boolean type:

Result

wBob
  • 13,710
  • 3
  • 20
  • 37
0

I tried something like this.

  1. from SQL table, brought all the processed files as comma-separated values using select STRING_AGG(processedfile, ',') as files in lookup activity

  2. Assign the comma separated value to an array variable (test) using split function @split(activity('Lookup1').output.value[0]['files'],',')

  3. meta data activity to get current files in directory

  4. filter activity to filter the files in current directory against the processed files

items:

@activity('Get Metadata1').output.childitems

condition:

@not(contains(variables('test'),item().name))
m4n0
  • 29,823
  • 27
  • 76
  • 89
All About BI
  • 493
  • 3
  • 6
  • Thanks, It worked.. I was earlier looping through each entry and it was taking a huge time. – Saurabh Mehta Sep 21 '21 at 09:58
  • HTH, pls mark it as answer if it helped, Saurabh – All About BI Sep 22 '21 at 10:03
  • @wBob, can we make your logic work for multiple files? For example, if i have 10 files in blob and out of which 4 are processed and i need to skip them using filter. (without a foreach loop) – All About BI Sep 25 '21 at 07:50
  • Hey @AllAboutBI, I just posted an example checking for multiple files. I am using a For Each activity with the Append to do the comparison, see what you think: https://stackoverflow.com/a/69723896/1527504 – wBob Oct 26 '21 at 13:29