as described in https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ , the partition key should be unique.
I am building an application that needs to store subscriptions to a topic (think of a chat app). Millions of those subscriptions would need to be stored in the database and when ever a message should be emitted to the subscribers, the application needs to get all subscribers from the table.
Naive approach
The naive approach would be, to design a primary key like:
SUBSCRIPTIONS|<topic>
The sortkey would then order all subscriptions for the <topic>
by time of subscription, region and a few other criteria.
Unfortunately the partition key is by far not unique but would allow to fetch all subscriptions in a blink.
Also, considering a maximum table size sets a hard limit to the number of subscriptions that can be held in a partition, and thus the maximum number of subscription in general for this design. So, this is designed to fail scalability.
Alternative
The other way of designing it would be to use something like
SUBSCRIPTIONS|<clientId>
to hold each and every subscription separately per client and move the <topic>
into the sortkey. This would allow to scale the table (partionining) far better, but would need scans to find all subscribers for a certain <topic>
.
An index might help here, but how does an index scale over multiple partitions? and how will it perform?