This is a great question. I'm also interested to hear what others are doing to solve this problem.
If you're storing your data with a Partition Key of <type>-<id>
, you're supporting the access pattern "retrieve an item by ID". You've correctly noted that you cannot use begins_with
on a Partition Key, leaving you without a clear cut way to get a collection of items of that type.
I think you're on the right track with creating a Partition Key of <type>
(e.g. Users
, Devices
, etc) with a meaningful Sort Key. However, since your items aren't evenly distributed across the table, you're faced with the possibility of a hot partition.
One way to solve the problem of a hot partition is to use an external cache, which would prevent your DB from being hit every time. This comes with added complexity that you may not want to introduce to your application, but it's an option.
You also have the option of distributing the data across partitions in DynamoDB, effectively implementing your own cache. For example, lets say you have a web application that has a list of "top 10 devices" directly on the homepage. You could create partitions DEVICES#1
,DEVICES#2
,DEVICES#3
,...,DEVICES#N
that each stores the top 10 devices. When your application needs to fetch the top 10 devices, it could randomly select one of these partitions to get the data. This may not work for a partition as large as Users
, but is a pretty neat pattern to consider.
Extending this idea further, you could partition Devices by some other meaningful metric (e.g. <manufactured_date>
or <created_at>
). This would more uniformly distribution your Device
items throughout the database. Your application would be responsible for querying all the partitions and merging the results, but you'd reduce/eliminate the hot partition problem. The AWS DynamoDB docs discuss this pattern in greater depth.
There's hardly a one size fits all approach to DynamoDB data modeling, which can make the data modeling super tricky! Your specific access patterns will dictate which solution fits best for your scenario.