73

Due to simple setup and low costs I am considering using AWS S3 bucket instead of a NoSQL database to save simple user settings as a JSON (around 30 documents).

I researched the following disadvantages of not using a database which are not relevant for my use case:

  • Listing of buckets/files will cost you money.
  • No updates - you cannot update a file, just replace it.
  • No indexes.
  • Versioning will cost you $$.
  • No search
  • No transactions
  • No query API (SQL or NoSQL)

Are there any other disavantages of using a S3 bucket instead of a database?

Simon Thiel
  • 2,888
  • 6
  • 24
  • 46

2 Answers2

98

You are "considering using AWS S3 bucket instead of a NoSQL database", but the fact is that Amazon S3 effectively is a NoSQL database.

It is a very large Key-Value store. The Key is the filename, the Value is the contents of the file.

If your needs are simply "Store a value with this key" and "Retrieve a value with this key", then it would work just fine!

In fact, old orders on Amazon.com (more than a year old) are apparently archived to Amazon S3 since they are read-only (no returns, no changes).

While slower than DynamoDB, Amazon S3 certainly costs significantly less for storage!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • 7
    For late readers like myself, I just want to point out that the cost advantage very much depends on the size of your payloads. This is because costs of S3 scale with requests whereas those of Dynamo scales with throughput. In my own scenarios (incl. on-demand), Dynamo can actually be cheaper for small payloads of 4kb or less. You can easily check this using https://calculator.aws/#/ – Henrik Koberg Dec 12 '21 at 11:32
  • If that's true, why wouldn't you use AWS S3 bucket instead of a NoSql db (e.g. mongoDB)? – Sebastian Nielsen Jun 04 '23 at 09:30
37

Context: we use S3 for some "database" (lit. key/value structured storage).

It should be noted that S3 does actually have search and, depending on how you structure your data, queries in the form of S3 Select (and, if you have the time: Athena).

Edit: prior to December, 2020, S3 was eventually consistent. Now it it is strongly consistent. Following disadvantages doesn't apply anymore, but are here for historical reasons.


Before December, 2020, the biggest disadvantage/architectural challenge was that S3 was eventually consistent (which was actually the reason why you could not "update" a file). This manifested itself in some behaviours which your architecture needed to tolerate:

  • Operations were cached by key, so if you attempted to get an object that doesn't exist, and then create it- for a period of time* any gets on that object will return that it does not exist.
  • There was no global cache, so you could get two different versions of the same object for a period of time* after it has been overwritten.
  • List operations provided a semi-unstable iterator. If you were going to list on a large number of objects in a bucket that was being updated, then chances are you were not going to visit all the objects by the end of the iterator.

*period of time is purposely undefined by AWS, however, from observation, it is rarely more than a minute.

rodorgas
  • 962
  • 2
  • 12
  • 29
thomasmichaelwallace
  • 7,926
  • 1
  • 27
  • 33
  • 2
    what do you mean with "There is no global cache, so you can get two different versions of the same object for a period of time* after it has been overwritten." if its eventually consistent, shouldnt this be possible? – Khan Sep 27 '20 at 15:45
  • 12
    This has been addressed https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ – vozille Dec 16 '20 at 14:43