To better understand how to design a single DynamoDB table I tried to develop a small movie database application.
This is my current DynamoDB table design:
╔══════════╦════════════╦═══════════════╗
║ PK ║ SK ║ Title ║
╠══════════╬════════════╬═══════════════╣
║ pk_1 ║ movie ║ Die Hard ║
║ pk_1 ║ actor_pk_3 ║ Bruce Willis ║
║ pk_1 ║ tag_pk5 ║ Action ║
║ ║ ║ ║
║ pk_2 ║ movie ║ Looper ║
║ pk_2 ║ actor_pk_3 ║ Bruce Willis ║
║ pk_2 ║ actor_pk_4 ║ Emily Blunt ║
║ pk_2 ║ tag_pk5 ║ Action ║
║ ║ ║ ║
║ pk_3 ║ actor ║ Bruce Willis ║
║ pk_3 ║ movie_pk_1 ║ Die Hard ║
║ pk_3 ║ movie_pk_2 ║ Looper ║
║ ║ ║ ║
║ pk_4 ║ actor ║ Emily Blunt ║
║ pk_4 ║ movie_pk_2 ║ Looper ║
║ ║ ║ ║
║ pk_5 ║ tag ║ Action ║
║ pk_5 ║ movie_pk_1 ║ Die Hard ║
║ pk_5 ║ movie_pk_2 ║ Looper ║
╚══════════╩════════════╩═══════════════╝
* The table has one GSI, it is just the PK and SK reversed.
I tried to design the database so that I can always get all the data that I want with one query (a single roundtrip). The design works and it currently satisfies most of the acces patterns that I'm going for.
Some examples:
- If I want everything to the movie "Die Hard" I would just query the "pk_1" on the normal table.
- If I want all movies I query the GSI with "movie"
- If I want all actors to the movie "Die Hard" I query "pk_1" and the SK begins with "actor"
- ... and so on
Here is my big questions:
- Is this a good DynamoDB table design or is it wrong / bad?
If the table design is not bad than these are my follow up questions:
Is the duplicated data normal and should I do it like this?
Is it normal to have so much business logic getting, inserting and updating data?
What is the best way to "securely" insert the data? First adding the movie, then adding "actor_" and "tag_" feels wrong and is bound to fail at one point
How can I assure that the duplicated data always stays the same as the "master"?
How can I handle updates on relationships that could have millions of entries? For example if I rename the tag "Action" to "ACTION" I would have to update every movie with this tag. Currently I only see me updating it in batches so the data is not consistent for a certain amount of time.
Currently I doubt every decision I do on this table because it just feels wrong coming from relational databases...