Choice PK and SK in DynamoDB

Question

I have the following two items which I need to store in dynamodb

Artists
Songs

The artists have an ID (unique for artists), name and gender. The Songs have an ID (unique for songs), title, genre, artist and rating.

How should I model this in DynamoDB?.

I was thinking about this: ID as primary key and having a sort key which contains artist or song so they are distinguished. Is this a good choice? In examples I find I see more variety in the sort key.

What about the field artist in song items? Should I just point to the ID of the artist?

Update: I have many common access patterns. I can probably solve it by creating some indexes but still I have to choose for a good PK/SK:

get songs based on title
get songs based on rating
get songs based on genre
get songs based on artist

get artist based on rating
get artist based on gender
get artist based on name

Thanks

score 9 · Accepted Answer · answered Apr 04 '20 at 15:21

9

One thing I found out (the hard way) about NoSQL (DynamoDB) modeling is that you need to know all your access patterns before you model your table. In and RDBMS, it's rather common to model first and optimize indexes later as access patterns change. This is not straightforward in NoSQL modeling (otherwise there would be massive migrations from RDBMS to NoSQL).

Having that said, I will now suggest a simplistic model and I will update my answer as the question is updated with access patterns (i.e. "I need to get all songs for an artist").

Artist:

PK: Artist-<Artist ID>, i.e. Artist-1234 SK: <Name> Attributes: Gender etc.

Song:

PK: Song-<Song ID>, i.e. Song-5678 SK: <Genre> Attributes: Genre, Artist ID, Rating

This approach will only allow you to get your entities by using their ID.

While it's common in NoSQL to de-normalize data (i.e. store artist data in song) for easiest/more efficient access, I'd go with storing the artist ID because it allows easier updates and better consistency.

answered Apr 04 '20 at 15:21

Tasos P.

3,994
2
21
41

Thank you very much. I've updated my question with the common access patterns. It's still difficult to guess which pattern will be accessed most. – mealesbia Apr 04 '20 at 15:32
Hmm I didn't think about the possiblity to use different attributes as one SK (e.g. genre for songs). Seems promising – mealesbia Apr 04 '20 at 15:36
You can find more examples [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html). You can (and probably should) also consider [GSI overloading](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html). – Tasos P. Apr 04 '20 at 15:43
3

Good advice that with NoSQL you should try to understand your access patterns before designing tables and indexes. Same with all DBs really, but especially so with NoSQL. – jarmod Apr 04 '20 at 15:47
You will need to use GSI to query for non key attributes eg get songs based on rating, for gsi1 pk=rating, sk=songId – user1912383 Dec 01 '22 at 07:29

Sandeep Dixit · Answer 2 · 2022-07-18T14:57:53.320

DynamoDb is a very different world from the SQL world. There are trade-offs for getting unlimited scalability.

Do not rush to model your Db if the following appears absurd or new to you. Take your time to read more to understand.

DynamoDb Tables have Partitions Keys which are used to automatically determine which physical partition data will be stored in. Do not mistake these with Primary Keys (also called PK). Partitions Keys are not generally unique.
Single table is supposed to have store multiple entities (unlike SQL world). For example, user profile and user orders can be stored like this.

PK: User#1 , SK: PROFILE# , username:dixitsandeep , addresses:[]

PK : User#1 , SK: ORDER#109, items:["ITEM#1122","ITEM#9977]

PK : User#1 ,SK: ORDER#2090, items:["ITEM#2288","ITEM#6655]
When data is accessed using, Combination of Partition Key and Sort Key it results in a unique item.
There is NO concept of JOIN ACROSS TABLES. Joins take place within multiple entities stored in a Single Table. You need to keep traditional normalization out of consideration.
There are many patterns to simulate joins. One of them is to fetch data using Partition Key only (without specifying a Sort key) or with some filter on the Sort key. For example, you can use PK User#1 to fetch both PROFILE and ORDERS in a single query.

In other words, When data is accessed using, Combination of Partition Key and Sort Key it results in a unique item. When you broaden the Sort Key, you get multiple items. You simulate join by broadening the Sort Key filter.

You almost always need to specify a partition key for your queries. Queries without partition keys could return partial data.
You can create up to 20 indexes on a table. When you create an Index you generally make an attribute a PK, SK of the index. Unlike SQL world, in DynamoDb you specify the index when you want to fetch data using that index. You do not need to specify an index while inserting data. Indexes may help you to filter data based on attributes other than PK, SK of the main table.

With the above points in mind, the Partition Key choice should be such that it has lots of possible values. One way to think may be as follows: If you are having millions of users firing 1 million queries per second, then these queries should be landing to different Partition Keys. During high load cases, Partition Key may become a performance bottleneck if too many queries want to access data with the same partition key. That means the choice of Partition Key largely depends on how the application accesses the data: access patterns.

UserRole for example is a bad choice of PartitionKey because it is likely to group lots of data in a single partition.

AWS resources on DynamoDb Data modeling.

https://youtu.be/KYy8X8t4MB8

https://youtu.be/0uLF1tjI_BI

Choice PK and SK in DynamoDB

2 Answers2