DynamoDb + S3 + CloudSearch + Redis

Question

I'm currently creating a scheme for my application and I'm wondering if my thinking is right

Example : Ecommerce site

In DynamoDb, I would put products ( product_id, meta-data link to S3)

S3, i would use it for storing Search Data Format (SDF/JSON) (Product name, product description, price, ...etc )

Amazon CloudSearch would be used to index documents in S3, and to be able to search them. Redis would be used to cache results

Is my scheme right? Can s3 be a good "database" ?

Is DynamoDb here even needed?

score 6 · Accepted Answer · answered Sep 04 '12 at 18:08

If you are thinking that S3 would just be the source of record for your products and you are not expecting heavy reads/writes, then it COULD work, but you have to recognize that it will be far far slower than using a real database. Not just 1-2x slower but MANY magnitudes slower. We use S3 for storing audit data for realtime data stored in Postgres - works a charm, but this is data that is written once and read rarely. Retrieval times when it does have to retrieve audit records is > 50ms. This type of speed is usually not acceptable when you need to manipulate multiple records at one time.

If you are going to be using dynamoDB anyway, why not just use that to store what you'd be storing on s3? Trying to adhere to the concept of keep it simple, I would use the following stack:

dynamoDB to be the system of record and to do some searches
Cloudsearch for more flexible searching than what dynamodb can provide
S3 for static files (product images, etc.)

And again, to keep things simple, Skip Redis for caching if you are already using dynamoDB and don't plan on using any of Redis' specialized dastatypes - ie: your caching will be nothing more than keys to strings, etc. Use Redis if you plan on taking advantage of its other datatypes or if you want to have a cache closer to your app - ie: you plan on using Redis on the webserver.

nice answer. i'll make some tests and probable just give of up of having database on top of s3 — Goranek, Sep 04 '12 at 19:53
Tests are key for any architecture - who knows, maybe given your record access patterns, S3 might work. For example, if you have a set group of items that don't change very often, then you "could" keep them as JSON/XML/etc. in S3 files and then just read them into your caching layer on server startup. If your testing shows that it makes sense, then go for it. However, I do think that you'll ultimately run into headaches if you try to force a round peg (S3) into a square hole (database functionality). — AlexGad, Sep 05 '12 at 03:37
Good answer. Actually, I have a related question - do you access CloudSearch directly from a UI or do you have some sort of backend (like a webapp running on EC2 or a lambda) to make calls to CloudSearch? I'm looking for the best practice. — 0x4ndy, Feb 10 '21 at 09:55

score 0 · Answer 2 · answered Sep 04 '12 at 06:50

0

Dynamo is used for storing write-extensive data. If your application does not require extensive writes over product_id and meta-data, I think RDS/MySQL is better.

answered Sep 04 '12 at 06:50

ciphor

8,018
11
53
70

what about S3 ? I mean, is it good to use it as a "database" ? – Goranek Sep 04 '12 at 07:58
S3 is a good "database" to store documents. But, if you want to use CloudSearch, you don't need to use S3 directly. I believe CloudSearch is built on top of S3. – ciphor Sep 04 '12 at 08:52
there isn't really a much info about using s3 as a database.. and i'm really curious why not.. i mean s3 isn't a typical file system.. it's actually a key/value store dressed up to look like a file system. I wanted to use dynamodb with it ..so I could easily filter one item, find where it's stored on s3, and then read it.. I'm not sure if using cloudsearch is good in this situation .. I wanted to use it only when i need complex queries – Goranek Sep 04 '12 at 11:56
@Goran: it's a matter of speed/latency – yadutaf Sep 04 '12 at 12:44
i understand that,but if I cache results to redis? would that matter? – Goranek Sep 04 '12 at 12:59

score 0 · Answer 3 · answered Sep 04 '12 at 12:51

When designing an application, you really should keep things as simple as possible from the beginning. It will always get worse with time :)

S3 is not a good DB. It has not been designed for this and is too slow. It's for file storage only. If you want to stick with DynamoDB, you should put all your products info in it, including the metadata.

CloudSearch may be a good option. You can also build you own "indexes" on top of DynamoDB. It requires more design and programming but might be worth considering. Here is a link to an excellent blog-post on this matter: http://blog.coredumped.org/2012/01/amazon-dynamodb.html.

So,

Is DynamoDB even needed: Yes, or RDS, Mongo,... any real DB depending on you needs.
Is S3 a good DB: I don't think so.

S3 wouldn't be used as a regular database. I know that S3 is slower than an example Mongodb, but that's why i use Redis.. I will cache queries, and here is cloudsearch too. Oh and putting everything in Dynamo is really not an option.. — Goranek, Sep 04 '12 at 12:56
Well, the core of my suggestion is: KISS. Keep It Stupid Simple. Use whatever DB engine/cache layer that fits you needs but keep it simple or it will end up being a real mess to maintain. — yadutaf, Sep 04 '12 at 14:15

DynamoDb + S3 + CloudSearch + Redis

3 Answers3