1

To better understand how to design a single DynamoDB table I tried to develop a small movie database application.

This is my current DynamoDB table design:

╔══════════╦════════════╦═══════════════╗
║ PK       ║ SK         ║ Title         ║
╠══════════╬════════════╬═══════════════╣
║ pk_1     ║ movie      ║ Die Hard      ║
║ pk_1     ║ actor_pk_3 ║ Bruce Willis  ║
║ pk_1     ║ tag_pk5    ║ Action        ║
║          ║            ║               ║
║ pk_2     ║ movie      ║ Looper        ║
║ pk_2     ║ actor_pk_3 ║ Bruce Willis  ║
║ pk_2     ║ actor_pk_4 ║ Emily Blunt   ║
║ pk_2     ║ tag_pk5    ║ Action        ║
║          ║            ║               ║
║ pk_3     ║ actor      ║ Bruce Willis  ║
║ pk_3     ║ movie_pk_1 ║ Die Hard      ║
║ pk_3     ║ movie_pk_2 ║ Looper        ║
║          ║            ║               ║
║ pk_4     ║ actor      ║ Emily Blunt   ║
║ pk_4     ║ movie_pk_2 ║ Looper        ║
║          ║            ║               ║
║ pk_5     ║ tag        ║ Action        ║
║ pk_5     ║ movie_pk_1 ║ Die Hard      ║
║ pk_5     ║ movie_pk_2 ║ Looper        ║
╚══════════╩════════════╩═══════════════╝

* The table has one GSI, it is just the PK and SK reversed.

I tried to design the database so that I can always get all the data that I want with one query (a single roundtrip). The design works and it currently satisfies most of the acces patterns that I'm going for.

Some examples:

  • If I want everything to the movie "Die Hard" I would just query the "pk_1" on the normal table.
  • If I want all movies I query the GSI with "movie"
  • If I want all actors to the movie "Die Hard" I query "pk_1" and the SK begins with "actor"
  • ... and so on

Here is my big questions:

  • Is this a good DynamoDB table design or is it wrong / bad?

If the table design is not bad than these are my follow up questions:

  • Is the duplicated data normal and should I do it like this?

  • Is it normal to have so much business logic getting, inserting and updating data?

  • What is the best way to "securely" insert the data? First adding the movie, then adding "actor_" and "tag_" feels wrong and is bound to fail at one point

  • How can I assure that the duplicated data always stays the same as the "master"?

  • How can I handle updates on relationships that could have millions of entries? For example if I rename the tag "Action" to "ACTION" I would have to update every movie with this tag. Currently I only see me updating it in batches so the data is not consistent for a certain amount of time.

Currently I doubt every decision I do on this table because it just feels wrong coming from relational databases...

Shamshiel
  • 2,051
  • 3
  • 31
  • 50
  • Here’s a similar question for a different data model that’s got a pretty thorough answer with some additional resources. I doesn’t answer your specific questions, but you may find helpful for understanding the single table strategy. https://stackoverflow.com/q/51055657/5563569 – Matthew Pope Dec 30 '19 at 06:55

1 Answers1

0

I don't quite understand the design you've written so feel free to correct me if I misunderstood any parts. Mainly, I don't get what PK_1 even means. If I tell you to get me all the information about Die Hard, I don't understand how you will be able to fetch this for me simply with your data base structure. I get the pk_1 has all the information, but how do you know pk_1 == die hard unless you check "pk_1" + "movie" (sort key), which will mean you will need to check every such movie.

I'll try to answer your questions first and then come up with a design afterwards:

  1. Q. [Is this design good/bad]. For good dynamoDb design, as long as you satisfy all your use cases and don't have bad partitioning, you have somewhere to start from. This means optimizing frequent writes & reads.
  2. Q [Duplicate data] Data is cheap. Duplication is fine if it makes your querying faster. Ideally, all the information you need to fulfil a request should come from a single read operation.
  3. Q [Lot of logic for writes] NO! Based on 1 & 2, you should try to model it in such a way that your writes are ezpz.
  4. Q [What are your write operations looking like? If you give your use cases (expansive) for writing, then it becomes a lot easier to model a ddb to suit your needs. The problem however is if you get a new use case. This might need some database redesign so think of most usecases in the beginning itself.
  5. Q [Duplicate data stays the same as master] Use fully consistent reads and you will always read from the master which is in sync. You will not need to maintain this.

How I would design this Use case for writes

Movie information comes with list of metadata including genre and a list of actors.

Save primary key as movie with sort key as METADATA. Now store movies with sort key as actor_ as the sort key. This will be an extra overhead while saving the movie information. You can do this using the batch write pretty easily. GSI having inverted sort key and primary key. If you get an actor name as input, search database GSI for action_ to get their movies. Similar this can be extended if you want to save directors, producers etc. Create another GSI against the attribute you have stored genre in and have the sort key as movies.

You will have two indexes in this way, but I feel writes would be slightly easier.

Soccergods
  • 440
  • 5
  • 17