2

I have a simple single-table design that I want to keep flexible for the future, I currently have 2 entity types: users and videos. Users have a 1:n relationship to videos.

The table's partition key is pk and sort key is sk.

Users: pk=u#<id> and sk=u#<id>, entityType: user

Videos: pk=u#<id> and sk=v#<id>, entityType: video

If I want to fetch all users, does it make sense to create a GSI with PK=entityType and SK=sk?

PGT
  • 1,468
  • 20
  • 34

2 Answers2

2

That is one approach you could take and it would get the job done, but it has a few drawbacks/side effects:

  • You would also replicate all videos in that GSI, which increases the storage and throughput cost of it
  • You would create a potentially huge item collection that contains all users, which could lead to a hot partition and may not scale well.

Instead, consider splitting up the huge user partition in the GSI into multiple ones with predictable keys.

If you plan to list your users by username later, you could take the first letter of their username as the partition key and thereby create around 26 (depending on capitalization and character set) different partitions, which would spread out the load a lot better. To list all users, you'd have to issue queries on all the partitions, which is annoying at small sizes, but will be more scalable.

Another option would be to define that you want to spread the users out among n partitions and then use something like hash(user_id) mod n to get a partition key for the GSI. That way you'd have to do n queries to get the values of all partitions.

Maurice
  • 11,482
  • 2
  • 25
  • 45
  • In this case, I would have to create a new attribute to hold the first letter of the username as the GSI partition key? And so, I'd have to handle all this via business logic? – PGT Apr 28 '22 at 07:23
  • Yes, you'd have to create a new attribute. That's usually something the data access layer should handle. – Maurice Apr 28 '22 at 09:31
1

No, because then all user writes will go to the same PK which isn’t ideal. Instead, setup a GSI with a GSI1PK holding your user ID and you can do a scan against it. Project in the essential attributes. Only set the GSI1PK for user entity types so it’s a sparse GSI.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
  • I don't have a `userId` aside from the `PK` that's also used by the video entity, I'll have to solve this but if I were to go down this path, would this be correct? Create a new GSI with `GSI1PK = email` (because email doesn't exist in video attributes). I'm reading on the AWS GSI doc and it says `Only items in the base table that contain those attributes appear in the index.`, so this seems to be the way to go. – PGT Apr 28 '22 at 06:40
  • You can create a new attribute and put the same value in there. It’s a common design pattern with DDB to have to manually insert values into index fields. It’s not a “load data then add some indexes” model. – hunterhacker Apr 28 '22 at 16:59