DynamoDB scalability: How to design the partionkey vs index

Question

as described in https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ , the partition key should be unique.

I am building an application that needs to store subscriptions to a topic (think of a chat app). Millions of those subscriptions would need to be stored in the database and when ever a message should be emitted to the subscribers, the application needs to get all subscribers from the table.

Naive approach

The naive approach would be, to design a primary key like:

SUBSCRIPTIONS|<topic>

The sortkey would then order all subscriptions for the <topic> by time of subscription, region and a few other criteria.

Unfortunately the partition key is by far not unique but would allow to fetch all subscriptions in a blink.

Also, considering a maximum table size sets a hard limit to the number of subscriptions that can be held in a partition, and thus the maximum number of subscription in general for this design. So, this is designed to fail scalability.

Alternative

The other way of designing it would be to use something like

SUBSCRIPTIONS|<clientId>

to hold each and every subscription separately per client and move the <topic> into the sortkey. This would allow to scale the table (partionining) far better, but would need scans to find all subscribers for a certain <topic>.

An index might help here, but how does an index scale over multiple partitions? and how will it perform?

Have you thought about using the topic + clientId as the partition key, as in `SUBSCRIPTIONS||`? It would be a unique partition key and you could then use the query operation and the `begins_with` operator to get all the subscriptions to a particular topic. — Milan Cermak, Feb 23 '19 at 21:58
Yes I did, but I understood begins_with only works on sortkey? Maybe I am wrong.. let me research ;) — wzr1337, Feb 23 '19 at 22:00
Does not seem so : https://stackoverflow.com/q/39591078/1246802 — wzr1337, Feb 23 '19 at 22:03
You could append shard index to you partition key and query all shards in parallel. So key would be: `SUBSCRIPTIONS|#` where `shard-index` may be, for example, random number between 0 and 128. Shard key can be some hash of client id (modulo 128, some prime might be better for shard upper bound) so client's subscription can be easily found and updated. — Marcin Sucharski, Feb 24 '19 at 12:27

DynamoDB scalability: How to design the partionkey vs index

0 Answers0