8

I have an Azure Logic App which gets triggered when a new file is added or modified in an SFTP server. When that happens the file is copied to Azure Blob Storage and then gets deleted from the SFTP server. This operation takes approximately 2 seconds per file.

The only problem I have is that these files (on average 500kb) are processed one by one. Given that I'm looking to transfer around 30,000 files daily this approach becomes very slow (something around 18 hours).

Is there a way to scale out/parallelize these executions?

Florin D. Preda
  • 1,358
  • 1
  • 11
  • 25
  • You mentioned: "The only problem I have is that these files (on average 500kb) are processed one by one." By default, a split-on is set on the SFTP trigger, so each file (if multiple ones are detected) will trigger a run instead of one run for all files. Are you not seeing this? – Derek Li Oct 26 '17 at 18:36
  • @Derek Yes, each file triggers a separate execution but the executions are sequential – Florin D. Preda Oct 26 '17 at 18:41
  • That doesn't sounds right. Split triggers should execute in parallel - can you check the "Diagnostics" tab and see if you're getting any "Run Throttled Events"? It could be that they are running in parallel, but because the actions are being throttled, it looks like they are running in sequence. – Derek Li Oct 26 '17 at 18:49
  • @FlorinD.Preda have you had any issues with your Logic App being able to consistently connect to the SFTP server, where you would be getting 'skipped' triggers? – aaronR Feb 08 '18 at 19:58
  • @aaronR Yes, I had but I believe it was the SFTP server rejecting the requests in my case. In any case, I ended up writing the transfer logic in C# – Florin D. Preda Feb 08 '18 at 21:06
  • @FlorinD.Preda Quick question to you, I have similar scenario where my logic app is using FTP connector to pick up the file from FTP server folder "input". Although in logic App I have set the FTP connector frequency to check for new file in every 3 sec but still FTP connector is taking almost 1m 30s to recognize new file in the folder and to run logic app instance. Did you face the same problem? – Varun05 Jan 09 '20 at 06:58

2 Answers2

0

I am not sure that there is a scale out/parallelize execution on Azure Logic App. But based on my experience, if the timeliness requirements are not very high, we could use Foreach to do that, ForEach parallelism limit is 50 and the default is 20.

In your case, my suggestion is that we could do loop to trigger when a new file is added or modified in an SFTP then we could insert a queue message with file path as content to azure storage queue, then according to time or queue length to end the loop. We could get the queue message collection. Finally, fetch the queue message and fetch the files from the SFTP to create blob in the foreach action.

Tom Sun - MSFT
  • 24,161
  • 3
  • 30
  • 47
  • I like the idea to que up the 'upload to blob folder' commands. After that you could try to use the build in batch handling and handle the upload in batches of multiple files in an parallel foreach loop – Rodrigo Groener Dec 04 '19 at 22:20
0

If you're C# use Parallel.ForEach like Tom Sun said. If you use this one I also recommend to use async/await pattern for IO operation (save to blob). It will free up the executing thread when file is being saved to serve some other request.

michal.jakubeczy
  • 8,221
  • 1
  • 59
  • 63