AWS DynamoDB or SimpleDB: "SELECT * FROM posts ORDER BY date LIMIT 10"

Question

I'm hoping to design a serverless backend with AWS. I'm using RDS as my primary datastore, but I'd like to move heavily-queried data (lets call these records "posts") to either DynamoDB or SimpleDB because of limitations on the number of concurrent connections to my RDS instance.

At first, DynamoDB looked like a good option (lots of hype), but then I started looking to see how I would select the most recent 20 posts from my DynamoDB table, limited to 10 with an optional offset for pagination..

It doesn't look like there's an easy way to do this. I've seen some suggest using a GSI with the same partition key and a sort-key for "post_date", but it seems this is considered a hack.

SimpleDB appears to be more flexible and designed better for my use case, but it's concerning how little AWS appears to support it.

Which AWS service is best for my use case?

score 0 · Accepted Answer · answered Jul 24 '16 at 14:06

0

I've seen some suggest using a GSI with the same partition key and a sort-key for "post_date", but it seems this is considered a hack.

How is that considered a hack? It sounds like you would be using Global Secondary Indexes exactly how they were designed to be used.

SimpleDB appears to be more flexible and designed better for my use case, but it's concerning how little AWS appears to support it.

I wouldn't use SimpleDB for any new development at this point. Amazon has basically replaced it with DynamoDB and just keeps SimpleDB around for people that were already using it.

answered Jul 24 '16 at 14:06

Mark B

183,023
24
297
295

Regarding, the "hack" solution, see comments on this answer: http://stackoverflow.com/questions/21794945/dynamodb-scan-in-sorted-order/28463257#28463257. Though, I don't think I'd need to reach scale where this becomes drawback – Rob Jul 24 '16 at 14:23
That comment is regarding the recommendation of always storing a single value in the indexed field. Basically setting an index on a "flag". You would be storing different values (dates) in your indexed field, so that wouldn't be an issue for you. In other words you would be using GSI correctly. – Mark B Jul 24 '16 at 15:01
Ok, so you're saying that I'd be using GSI correctly, if I had 500,000 items all with the same partition key "1" and with varying date values (but not necessarily distinct) for the sort-key? – Rob Jul 24 '16 at 15:14
Not all records have partition key of '1' do they? There would be multiple partition key values, and you would be querying for records where the partition key is '1', and ordering by the sort key, correct? Without more exact information I can't tell you if your partition key is the best possible one, but I think in general you would be using GSI correctly here. I recommend reading this entire page before making any further design decisions: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html – Mark B Jul 24 '16 at 15:18
for simplicity, I was hoping all records could have the same value of "1" as the partition key, but ... hmm, perhaps I should then use a ymd date as the partition key, e.g. "2016-07-24", and a more exact ISO 8601 timestamp as the sort-key? The client would need more complex code to query through a paginated reverse-chrono listing of items, but the read/writes would be more evenly distributed in dynamodb. – Rob Jul 24 '16 at 15:59
All records having the same value as the partition key doesn't make any sense, and definitely will hurt your table performance. Also your suggestion of using the date as the partition key is explicitly mentioned as a bad practice on the page I linked. As I said earlier, I suggest you read that page before continuing. – Mark B Jul 24 '16 at 16:25
Thank you for the link, I did read it. But unless I'm missing something, it doesn't give guidance on how to paginate through a large collection. I want to do the equivalent of "select * from posts order by date limit 10 offset 0; select * from posts order by date limit 10 offset 10". Seems dilemma is that partition key not knowable unless it's a predictable value, such as a record's date. – Rob Jul 24 '16 at 19:31
A NoSQL database, and in particular DynamoDB, might not be a good fit for your needs if you are having trouble figuring out what your key should be. In general you need to read the documentation and figure out how to properly distribute your data across the DynamoDB partitions. Once you have that accomplished, you'll need to figure out how to create a similar query in DynamoDB. The pagination and result limit functionality is going to be a bit different in DynamoDB from what you are used to with a typical RDBMS. – Mark B Jul 24 '16 at 21:07

AWS DynamoDB or SimpleDB: "SELECT * FROM posts ORDER BY date LIMIT 10"

1 Answers1