0

A bit of context: my Azure Synapse pipeline makes a GET Request to a REST API in order to import data to the Data Lake (ADLSGen2) in parquet file format.

I am looking forward to requesting data to the API on an hourly basis in order to get information of the previous hour. I have also considered to set the trigger to run every half an hour to get the data of the previous 30 minutes.

The thing is: this last GET request and Copy Data debug took a bit less than 20 minutes. The DUI used was set in "Auto", and it equals 4 even if I set it manually to 8 on the activity settings.

I was wondering if there are any useful suggestions to make a Copy Data activity work faster, whatever the cost may be (I would really like info about it, if you consider it pertinent).

Thanks in advance!

Mateo

1 Answers1

0

You need to check which part is running slow. You can click on the glasses icon to see the copy data details. If the latency is on "Time to first byte" or "Reading from source" the issue is on the REST API side. If the latency is on "Writing to sink" the problem may be from writing to data lake. If the issue is on the API side, try to contact the provider. Another option, if applicable, is to use a few copy data activities, each will copy a part of the data. If the issue is on data lake, you should check the setting on the sink side.

Chen Hirsh
  • 736
  • 1
  • 1
  • 13
  • Hello, Chen! Thank you for your answer. The latency is in "Reading from source", which turns out to be reasonable, as we are bringing to our Data Lake a huge volume of data (we plan on writing 35 million records an hour, in total). I took your suggestion and it seems things worked better. However, our issue now concerns the fact that we could get our data more efficiently (we are making a GET Request to a REST API which shows no page number in the body of the response). I will post a question addressing that matter, but if you happen to have a say please let me know! Thanks again! – Mateo Estrada Jan 25 '23 at 19:25
  • Maybe you can split the API calls, so that each copy activity will get, for example, a different year of data, or keys from 100 to 200, and the next call will get keys 201 to 300. You can use a foreach loop in ADF, with a copy activity inside. It all depends if the API provider support that kind of query. – Chen Hirsh Jan 26 '23 at 06:17