I have millions of files in one container and I need to copy ~100k to another container in the same storage account. What is the most efficient way to do this?
I have tried:
- Python API -- Using BlobServiceClient and related classes, I make a BlobClient for the source and destination and start a copy with
new_blob.start_copy_from_url(source_blob.url)
. This runs at roughly 7 files per second. - azcopy (one file per line) -- Basically a batch script with a line like
azcopy copy <source w/ SAS> <destination w/ SAS>
for every file. This runs at roughly 0.5 files per second due to azcopy's overhead. - azcopy (1000 files per line) -- Another batch script like the above, except I use the
--include-path
argument to specify a bunch of semicolon-separated files at once. (The number is arbitrary but I chose 1000 because I was concerned about overloading the command. Even 1000 files makes a command with 84k characters.) Extra caveat here: I cannot rename the files with this method, which is required for about 25% due to character constraints on the system that will download from the destination container. This runs at roughly 3.5 files per second.
Surely there must be a better way to do this, probably with another Azure tool that I haven't tried. Or maybe by tagging the files I want to copy then copying the files with that tag, but I couldn't find the arguments to do that.