0

I'm uploading concurrently 1500 blobs (1Mo max per blob) to a container in Azure Storage Account (StorageV2 (general purpose v2))

So far i'm uploading them via python package azure-blob_storage with the pseudo-code below.

async def upload_blobs_async(blobs_args:list):
   tasks = [asyncio.create_task(upload_blob_async(blob_arg)) for arg in blobs_args]


   # concurrent call return_when all completed. Safe.
   finished, pending = await asyncio.wait(
       tasks, return_when=asyncio.ALL_COMPLETED
   )

   return None

....

async def upload_blob_async(args: dict):
  # Instantiate a new BlobServiceClient using a connection string
  blob_service_client = asyncbsc.from_connection_string(CONNECTION_STRING_STORAGE)

  async with blob_service_client:
      # Instantiate a new ContainerClient
      container_client = blob_service_client.get_container_client(args["blob_name"])
      # Upload a blob to the container
      await container_client.upload_blob(...)

With no restriction on the number of // queries, sending 1500 docs has a huge impact on my E2E response time

What would you recommand in order to lower the E2E ? Using a semaphore in order to send maybe requests 100 by 100 ? Also i need to keep the general purpose storage account (i/o premium account) because i use the tags (not available on the premium...).

Daniel Mann
  • 57,011
  • 13
  • 100
  • 120
orville
  • 51
  • 6
  • Have you tried creating just one BlobServiceClient? At least in .NET that is the common pattern. Here you would create 1500 clients right? – juunas Nov 25 '22 at 18:10
  • Yes indeed very good point. Way too much instances of client here . Ty – orville Nov 25 '22 at 22:12

1 Answers1

0

Most of azure services has throughput limit, not only the azure storage. There is 500 request per second limitation, there is also ingress/egress limitation. Check this url: https://learn.microsoft.com/en-us/azure/storage/blobs/scalability-targets. And please contact Azure support.

woody2k
  • 76
  • 5
  • Indeed I think I’ll use a semaphore to limit concurrency. From my understanding the 500 req per sec limit applies for a single blob, not the container (or storage account). Apart from trial and error would you have any suggestion on how to find the optimal nb of concurrent coroutines ? – orville Nov 26 '22 at 08:17