5

I am attempting to download a large file via Azure-python-sdk get_blob_to_stream, however, my program keeps exiting with the return code 137 - which seems to be related to running out of memory. (I can see in top that python is consuming more and more memory until it is killed).

Code:

with io.open(file_path, 'w') as file:
    self.blob_service.get_blob_to_stream(container_name='container', blob_name=blob_name, stream=file)

I am using azure-sdk-for-python and get_blob_to_stream for this and the file is about 6.5 gb.

The file is being created as 0 bytes and nothing is written to it - am I doing something obviously wrong here?

Matt
  • 619
  • 3
  • 8
  • 23

1 Answers1

6

After downloading the SDK and walking through the code I found out how to get this big blob downloading.

  1. You must provide a max_connections value greater than 1 - this enables the ability to download the file in chunks and writing them to the stream.
  2. You need to pass in a binary stream ('wb')

Working code from question example:

with io.open(file_path, 'wb') as file:
    self.blob_service.get_blob_to_stream(container_name='wxdata', blob_name=blob_name, stream=file, max_connections=2)
Matt
  • 619
  • 3
  • 8
  • 23
  • 1
    I'm getting this error in a Python Azure Function. How do I set the `max_connections` value in this context? `def main(myblob: func.InputStream): logging.info(f"Python blob trigger function processed blob \n" f"Name: {myblob.name}\n" f"Blob Size: {myblob.length} bytes") myblobBytes = myblob.read() fileName = pathlib.Path(myblob.name).name` – ericOnline Sep 18 '20 at 14:59