Using S3 as a database vs. database (e.g. MongoDB)

Question

Due to simple setup and low costs I am considering using AWS S3 bucket instead of a NoSQL database to save simple user settings as a JSON (around 30 documents).

I researched the following disadvantages of not using a database which are not relevant for my use case:

Listing of buckets/files will cost you money.
No updates - you cannot update a file, just replace it.
No indexes.
Versioning will cost you $$.
No search
No transactions
No query API (SQL or NoSQL)

Are there any other disavantages of using a S3 bucket instead of a database?

From what you describe, you don't _need_ a database, indeed. But when you do (you need transactions or query DSL, etc), then you don't really have a choice. — Sergio Tulentsev, May 13 '19 at 08:13
If you just need a storage, you can try out [Minio](https://min.io/) — Nikolai Shevchenko, May 13 '19 at 08:31

score 98 · Accepted Answer · answered May 13 '19 at 08:52

98

You are "considering using AWS S3 bucket instead of a NoSQL database", but the fact is that Amazon S3 effectively is a NoSQL database.

It is a very large Key-Value store. The Key is the filename, the Value is the contents of the file.

If your needs are simply "Store a value with this key" and "Retrieve a value with this key", then it would work just fine!

In fact, old orders on Amazon.com (more than a year old) are apparently archived to Amazon S3 since they are read-only (no returns, no changes).

While slower than DynamoDB, Amazon S3 certainly costs significantly less for storage!

answered May 13 '19 at 08:52

John Rotenstein

241,921
22
380
470

7

For late readers like myself, I just want to point out that the cost advantage very much depends on the size of your payloads. This is because costs of S3 scale with requests whereas those of Dynamo scales with throughput. In my own scenarios (incl. on-demand), Dynamo can actually be cheaper for small payloads of 4kb or less. You can easily check this using https://calculator.aws/#/ – Henrik Koberg Dec 12 '21 at 11:32
If that's true, why wouldn't you use AWS S3 bucket instead of a NoSql db (e.g. mongoDB)? – Sebastian Nielsen Jun 04 '23 at 09:30

score 37 · Answer 2 · edited Jan 10 '22 at 01:43

Context: we use S3 for some "database" (lit. key/value structured storage).

It should be noted that S3 does actually have search and, depending on how you structure your data, queries in the form of S3 Select (and, if you have the time: Athena).

Edit: prior to December, 2020, S3 was eventually consistent. Now it it is strongly consistent. Following disadvantages doesn't apply anymore, but are here for historical reasons.

Before December, 2020, the biggest disadvantage/architectural challenge was that S3 was eventually consistent (which was actually the reason why you could not "update" a file). This manifested itself in some behaviours which your architecture needed to tolerate:

Operations were cached by key, so if you attempted to get an object that doesn't exist, and then create it- for a period of time* any gets on that object will return that it does not exist.
There was no global cache, so you could get two different versions of the same object for a period of time* after it has been overwritten.
List operations provided a semi-unstable iterator. If you were going to list on a large number of objects in a bucket that was being updated, then chances are you were not going to visit all the objects by the end of the iterator.

*period of time is purposely undefined by AWS, however, from observation, it is rarely more than a minute.

what do you mean with "There is no global cache, so you can get two different versions of the same object for a period of time* after it has been overwritten." if its eventually consistent, shouldnt this be possible? — Khan, Sep 27 '20 at 15:45
This has been addressed https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ — vozille, Dec 16 '20 at 14:43

Using S3 as a database vs. database (e.g. MongoDB)

2 Answers2

Linked