2

I created a Pipeline in Azure Data Factory V2. It will copy the data from Rest API and save this data in the form of JSON file in Azure Data Lake. Then I transform that JSON file using U-SQL and Copy that data into another folder in .csv format. My Pipeline. See the following Image of Pipeline.

enter image description here

The Rest API Return Data into JSON Format lie.

{
    "data": [
        {
            "id": "100024937598765",
            "name": "Sebastian Martinelli",
            "email": "sebastian.martinelli@abc.com",
            "administrator": false
        },
        {
            "id": "100024909012916",
            "name": "Diego Juarez",
            "email": "diego.juarez@abc.com",
            "administrator": false
        },
        {
            "id": "100025002270557",
            "name": "Jose Lopez",
            "email": "jose.lopez@abc.com",
            "administrator": false
        },
        {
            "id": "100024553664067",
            "name": "Valentin Montemarani",
            "email": "valentin.montemarani@abc.com",
            "administrator": false
        }
    ],
    "paging": {
        "cursors": {
            "before": "QVFIUmU1QnBOYThYTnJiQlNqVzItMFdoTVprSHh3cWZA4LXF2ZAE5nSjIxVWZAOWUc1ZAjdLZAjN2em1SazRYVno4TGE4aFBaOFdMaS1NMDdkeEduVkRsOTVhN3Jn",
            "after": "QVFIUjhWdm5EOTk3amJaWHVYR3p1OEZAZAQ0ZAoeTR5TDBwblE0Mmx3dC1zRXhPM2VLZAWdqR0RWQndUVnhpTGc3RkIzVkNIY21EcXFTQU93NHVxRFcxVW12dTNB"
        },
        "next": "https://graph.facebook.com/v2.12/1528385107457405/members?access_token=%2Cemail&limit=25&after=QVFIUjhWdm5EOTk3amJaWHVYR3p1OEZAZAQ0ZAoeTR5TDBwblE0Mmx3dC1zRXhPM2VLZAWdqR0RWQndUVnhpTGc3RkIzVkNIY21EcXFTQU93NHVxRFcxVW12dTNB"
    }
}

This API is not returning full data it returns data using next link by hitting that next link we can fetch next set of records. How can I fetch this type of data using Iteration in Data Factory V2? I tried other activities like For each, Until and If but unable to repeat that copy activity to fetch complete data.

The Base URL is like

https://graph.facebook.com/Community/groups?fields=privacy,name,purpose&limit=5&access_token=value

Now the next URL will be the same and have 1 more parameter after in it like

https://graph.facebook.com/v2.12/1528385107457405/groups?access_token=value&pretty=1&fields=privacy,name,purpose&limit=5&after=QVF

Is there any way to do this?

Waqas Idrees
  • 1,443
  • 2
  • 17
  • 36
  • Hover over the _next_ link. Does it look like a URL that you could reproduce? Usually a next URL is just has some parameters that tell it what page to return next. – Nick.Mc Apr 03 '18 at 11:17
  • Yes it is a URL and will return next set of records. But how can we iterate this copy activity till there is next link available in response? – Waqas Idrees Apr 03 '18 at 11:27
  • Yes I know the URL returns next set of records. But what does it look like? Does it have parameters that indicate the page etc.? You might be able to use the _For Each_ activity, though I've never used it before: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity Generally anything non trivial in data factory has to be compiled into a DLL and called. Please post a sample of the next URL (and the URL on the first page) Normally these are just URL's with parameters. If that's the case you have some options – Nick.Mc Apr 03 '18 at 13:40
  • You say you are using a _REST API_. An API normally has documentation on how to use it. If it doesn't, it's not a very good API – Nick.Mc Apr 03 '18 at 13:41
  • Looking at your resulting JSON, are you saying that the link against the _next_ property is the one you need to follow? Do you ever know how many of those there will be? – Nick.Mc Apr 03 '18 at 14:14
  • @Nick.McDermaid We don't know that how much it will be. Basically, I am fetching data from Graph API of facebook. I have to get next variable from json and then iterate the activity till there will be next have value in the response. +Question is updated. – Waqas Idrees Apr 03 '18 at 14:17
  • 2
    Since you don't know how many iterations this will take, then you'd need to use `Until`. You need to extract data out of JSON which you just returned, and I don't see any information on how to do that - I don't expect that it's possible. If this needs to be done in the cloud then I would approach this by writing some powershell to do the basics (call the API, write the JSON to a data lake, inspect the JSON to get the _next_ URI, call the _next_ URI, repeat) and run that in Azure Automation. Then you could call an ADF activity from there using a trigger or just do it all in Powershell – Nick.Mc Apr 03 '18 at 14:30
  • Hey @Nick.McDermaid any idea on how to terminate the Until activity in ADFv2? I have a flag set as a Pipeline parameter and expression in Until which checks the flag. But I dont know where to change that flag parameter to false and exits the Until loop. – Gagan Jul 10 '18 at 03:42
  • Like I say I've never used this before but I'm happy to try and help. What does the final page look like? Is it missing the _next_ link altogether? That what you could use to trigger it - you need to manually inspect the last page and see how it is different from the other pages. Now you need to work out a ADFv2 expression that returns true when the document is a 'last page' document. However.. looking here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-system-variables I can't see any way to inspect the document. – Nick.Mc Jul 10 '18 at 04:02
  • Googling around, it looks like there are a lot of properties you can access that might let you inspect the JSON being loaded, but I can't find a good reference. If you are at the stage where you are successfully processing files but you just don't know how to stop the loop then it's probably worth continuing in ADFv2. If not... seriously consider running C# in batch. – Nick.Mc Jul 10 '18 at 04:38
  • You might be able to use the _Get Metadata_ Task to inspect the file and see if it has a _Next_ link. – Nick.Mc Jul 11 '18 at 02:41

1 Answers1

3
  1. Copy activity has a built-in support for Pagination. In a case of @Waqas Idrees the pagination rule should be set as:

    • Name: AbsoluteUrl
    • Value: $.data.paging.next

    Such setting can be found on Source tab in a Copy activity:

enter image description here

  1. There is no need to use U-SQL to transform hierarchical to a tabular format. Copy activity Mapping can do it on a fly

Therefore answer to the question of OP:

How can I fetch this type of data using Iteration in Data Factory V2?

No need to use Iterations. Consider to use a Copy activity functionality.

P.S. Because exploration of such topic took quite considerable amount of my time and I was not able to find a similar guidance during my research, I've created a blog post about it and described the process step-by-step: Azure Data Factory and REST APIs - Mapping and Pagination

P.P.S. Because of pace of development of ADF, I accept that in 2018 April built in pagination could not be part of the product yet. In this case it explains why the question was not answered that time. However, I hope that my findings will help someone who faced similar challenge these days.

Alexander Volok
  • 5,630
  • 3
  • 17
  • 33