0

I am using Azure function apps(Python) with a blob trigger to process a CSV and move the records to a event hub. I have a working code(up to 50 rows) after following a standard documentation. However I want to know what approach should be followed if the file is in size range of a few GBs. Will this entire file be sent to the Azure function in one go? What if it needs to be read in a chunks of fixed size or line by line, will the trigger concept of Azure support that?

I am looking for any approach/code for the above problem in python that avoid loading the complete file in the azure function container memory.

1 Answers1

0

If you have a file which is unwieldly for a normal web request, you will probably be better served by uploading it to an object storage implementation (presumably Azure Blob Storage will be most convenient for you) and sending the new destination address to the function.

AMQP messages (these are what Event Hub is under the hood) are really more suited for short amounts of data. You could perhaps also make each line, or blocks of lines, in your CSV a unique message, but that would depend hugely on your use case.

You will then likely want to choose an object which supports a stream, instead of a whole file at once, such as BlockBlobService here's a reasonable example showing how to do this.

ti7
  • 16,375
  • 6
  • 40
  • 68