In terms of designing for DDB
Performance, the Get API
is key to milli-seconds data retrieval capability of DDB, therefore it is logical to design your data around this API
Table with Partition Key + Sort Key
Partition Key | Sort Key
--------------+-------------
Course1 | Student1
Course1 | Student2
Advantage:
- Able to use the
Get API
to get a Single record by the Partition Key
and the Sort Key
e.g. Get single record where Partition Key = "Course1" and Sort Key = "Student1"
- Able to use the
Get API
to get a List of records by only the Partition Key
e.g. Get all records where Partition Key = "Course1"
Disadvantage:
- If you know only the
Sort Key
(i.e. Student) and not the Partition Key
(i.e. Course), you won't be able to use the Get API
to retrieve the record only by the Sort Key
Note: In general, the efficiency (such that queries do not hit the ReadThroughput
exception "Roof" easily) of DDB Get API
queries is quite coupled with the uniqueness and distribution of the Partition Key
. The more the partition keys you have and spreaded out, the better the performance
Table with Partition Key only
Partition Key Only
--------------------
Course1#Student1
Course1#Student2
Advantage:
- Able to use the
Get API
to get a Single record by the Partition Key
e.g. Get single record where Partition Key = "Course1#Student1"
Disadvantage:
- Will not be able to use the
Get API
to get a List of records using only a subset of the Partition key
e.g. Get a List of records where Partition Key = "Course1"
About GSIs
Note: It is a common scenario to add Global Secondary Index on Tables to support Get API
calls using an alternate key, like Get a List of records where GSI Partition Key
= Course Name
Partition Key Only | Non Key Attribute (Course) For GSI
---------------------+---------------------------
Course1#Student1 | Course1
Course1#Student2 | Course1
You can have up to 20 GSI
Indexes (soft limit), with the option to remove this limit through a Support request
Partition Key Only | Non Key Attribute (Course) For GSI | Lecturer (For GSI 2)
---------------------+------------------------------------+---------------------
Course1#Student1 | Course1 | Lecturer1
Course1#Student2 | Course1 | Lecturer1
Conclusion
I would design a Table to have as many Unique values for the Partition Keys
as possible if the Performance is key i.e. Partition Key = Course1#Student1 VS Partition Key = Course1, Sort Key = Student1
Add GSIs
to Tables on-demand if you need to query by an alternate key
(Historically GSIs
were limited to 5 per Table, and had to be specified during Table creation, but these limitations are since lifted)