2

I have a project that regularly needs to upload millions of tiny (1 - 3 KB) image files to Google Cloud Storage. What's the recommended method/library to do this? I'm currently using gsutil but wonder if there's a better library. I recently came across google-cloud but it seems to be even slower (using blob.upload_from_filename()).

I'd like to be able to do this via Python (windows) but am open to other options if they provide significant performance advantages.

Any suggestions?

1 Answers1

1

A lot of optimization has gone into gsutil, I doubt using the raw libraries is going to be faster without considerable effort (though it may be more tuned for transferring large files than large numbers of files).

Try adding the -m flag to your cp command to multi-thread your upload. https://cloud.google.com/storage/docs/gsutil/commands/cp

After that the only thing you can probably do is parallelize across multiple machines (each machine copies a subset of the files).

Joshua Royalty
  • 599
  • 3
  • 8
  • Hi @ScottDavis . My understanding is that gsutil is a command line utility made by Google, that you can install on your local computer to interface with the cloud. I agree with your statements that using the API in python seems a bit cumbersome. In all my python code I use gsutil command lines using the subprocess module. It might be worth checking out my response to another query [here](https://stackoverflow.com/questions/44975080/is-there-any-advantage-to-using-gsutil-or-the-google-cloud-storage-api-in-produc/45012186#45012186) and seeing if it helps you :) – Paul Jul 24 '17 at 16:48