1

I'm using s3cmd to send compressed backup of accounts (shared hosting server) to S3. I'm thinking that aside from sending compressed backup. I can optionally backup my enter server to S3 and synchronize it periodically to keep the backup updated.

However, I have more than 10,000,000 files in a server and I don't want to be charged excessively on list requests since AWS charges $0.005 per 1,000 requests. ( https://aws.amazon.com/s3/pricing/ )

My question is does s3cmd sync lists a directory and checks each files checksum or properties to determine if a file has to be updated and if so, does each file basically counts as a list or put request? Thus, if I have 10,000,000 files that I need to sync, I'll be charged $50 for sync'ing the server daily or weekly with S3 even if say only 50,000 files need to be synchronized?

  • Side note: These days it is recommended to use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/). See the `aws s3 sync` command. Your question would still apply for that app, too. – John Rotenstein Mar 17 '16 at 20:02
  • Thank you John. I was about to ask that. I'm assuming now, that it Matt's answer applies to AWS CLI. – Marlon Owen Cruz Mar 18 '16 at 01:04

1 Answers1

0

s3cmd does issue LIST calls (which gets up to 1000 objects per call), and for objects whose MD5 checksum is not included in the LIST results (e.g. for objects uploaded via multipart upload, generally >15MB), yes it also issues HEAD calls for each object. Therefore, even a "null" sync on 10M objects will wind up issuing many LIST calls and depending on your object sizes, yes, many HEAD calls.

You should consider how to sync just a (changing) subset of your tree, instead of all 10M (mostly unchanging) objects, assuming your data set allows such.

Matt Domsch
  • 486
  • 2
  • 5
  • Thank you for your answer @matt-domsch. On his side note, John mentioned that AWS CLI is now preferred over s3cmd. Nevertheless, I'm assuming that your answer applies with AWS CLI as well. But if you're saying that s3 will only counts one list request per a thousand of object calls, then it seems that if i have 1M objects, I will only likely get 1000 list calls theoretically assuming there'd be no multi-part object. But I would like to know if there would be no put request count on "null" sync, but only the initial list request. – Marlon Owen Cruz Mar 18 '16 at 01:09