0

I have a lot of image files, and I want to upload all of them (about a million in a single directory) to RackSpace Cloud files in the fastest and most efficient way.

I'm using the python-cloudfiles script to upload them but it is very slow and I want to know different ways or python script code.

Probably it is very slow because it using one connection of each upload. I think to send all files in a tar and uncompress the directory is better way. but Cloud files do not support this way.

Who know any other way?

gioele
  • 194
  • 2
  • 10

2 Answers2

1

Partition your upload set, e.g. into 26 sets by first letter of filename if the naming is statistically balanced enough, and use one uploader per set in parallel.

Btw, some of the cloud uploader tools have a problem with memory usage when uploading huge filesets, so keep an eye on that too...

rackandboneman
  • 2,577
  • 11
  • 8
  • Yes. I split files and copy upload using other server. but I guess connection limit for Rackspace cloudfiles upload server. – andy kim Jul 09 '12 at 08:16
  • Might be dependent on cloud provider - with a lot of them doing tens of connections is MUCH faster esp with smaller files because of protocol latencies (confirmation of store requests etc...). Are you assuming there is a connection limit (I will not discuss how to get around it in that case), or do you know there is one? – rackandboneman Jul 09 '12 at 09:51
  • Is ziping them and uploading them a good option? – pahnin Nov 21 '14 at 06:09
0

If this is a one-time upload, I like turbolift. Just make sure to reduce the concurrency to prevent a high server load (e.g. --cc 4), and use --internal to upload over Service Net.

turbolift will use lots of CPU and RAM unless you reduce concurrency (seriously: you might crash your server if you don't!). This is great if you have a powerful server, not great if you have a small server.

Joe A
  • 300
  • 2
  • 9