2

my Azure webapp needs to download 1000+ very small files from a blob storage directory and process them.

If I list them, then download them one by one, it takes ages... Is there a fast way to do it? Like to download them all together?

PS: I use the following code:

from azure.storage.blob import ContainerClient, BlobClient

blob_list = #... list all files in a blob storage directory

for blob in blob_list:
    blob_client = BlobClient.from_connection_string(connection_string, container_name, blob)
    downloader = blob_client.download_blob(0)
    blob = pickle.loads(downloader.readall())
Ivan Glasenberg
  • 29,865
  • 2
  • 44
  • 60
user1403546
  • 1,680
  • 4
  • 22
  • 43

2 Answers2

1

I used Azure databricks for a similar problem. You could easily mount the Azure storage accounts in databricks (i.e. ADLS Gen2) then deal with storage files like local files. You could either copy the files or do your process/transform directly even without downloading them.
You could find the databricks mount steps in this LINK
In databricks you could also use dbutils functions to have OS like access to your files after mountiung your ADLS.
I hope this approach could help.

Amir Maleki
  • 389
  • 1
  • 2
  • 14
1

I would also point out that since you are using azure-batch you could use the blob mount configuration in your linux VMs. So the idea will be to mount the drive to your VM, hence take out all the download time, and your drive is attached to the vm.

Thanks and hope this help.

Tats_innit
  • 33,991
  • 10
  • 71
  • 77