0

How can you fetch data from an http rest endpoint as an input for a data factory?

My use case is to fetch new data hourly from a rest HTTP GET and update/insert it into a document db in azure.

Can you just create an endpoint like this and put in the rest endpoint?

{
    "name": "OnPremisesFileServerLinkedService",
    "properties": {
        "type": "OnPremisesFileServer",
        "description": "",
        "typeProperties": {
            "host": "<host name which can be either UNC name e.g. \\\\server or localhost for the same machine hosting the gateway>",
            "gatewayName": "<name of the gateway that will be used to connect to the shared folder or localhost>",
            "userId": "<domain user name e.g. domain\\user>",
            "password": "<domain password>"
        }
    }
}

And what kind of component do I add to create the data transformation job - I see that there is a a bunch of things like hdinsight, data lake and batch but not sure what the differences or appropriate service would be to simply upsert the new set into the azure documentDb.

MonkeyBonkey
  • 46,433
  • 78
  • 254
  • 460

3 Answers3

0

I think the simplest way will be to use the Azure Logic Apps. You can make a call to any Restfull service using the Http Connector in Azure Logic App connectors.

So you can do GET and POST/PUT etc in a flow based on schedule or based on some other GET listener:

enter image description here

Here is the documentation for it:

https://azure.microsoft.com/en-us/documentation/articles/app-service-logic-connector-http/

Aram
  • 5,537
  • 2
  • 30
  • 41
0

To do this with Azure Data Factory you will need to utilize Custom Activities.

Similar question here: Using Azure Data Factory to get data from a REST API

If Azure Data Factory is not an absolute requirement Aram's suggestion might serve you better utilizing Logic Apps.

Hope that helps.

Community
  • 1
  • 1
JustLogic
  • 1,738
  • 2
  • 12
  • 24
0

This can be achieved with Data Factory. This is especially good if you want to run batches on a schedule and have a single place for monitoring and management. There is sample code in our GitHub repo for an HTTP loader to blob here https://github.com/Azure/Azure-DataFactory. Then, the act of moving data from the blob to docdb will do the insert for you using our DocDB connector. There is a sample on how to use this connector here https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-documentdb-connector/ Here are the brief steps you will take to fulfill your scenario

  1. Create a custom .NET activity to get your data to blob.

  2. Create a linked service of type DocumentDb.

  3. Create linked service of type AzureStorage.

  4. Use input dataset of type AzureBlob.

  5. Use output dataset of type DocumentDbCollection.

  6. Create and schedule a pipeline that includes your custom activity, and a Copy Activity that uses BlobSource and DocumentDbCollectionSink schedule the activities to the required frequency and availability of the datasets.

Aside from that, choosing where to run your transforms (HDI, Data Lake, Batch) will depend on your I/o and perf reqs. You can choose to run your custom activity on Azure Batch or HDI in this case.

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
  • what about when pagination? Anyone dealing with big data needs to be able to paginate the input (the incoming REST API) – arcom Oct 01 '17 at 21:37