0

Given an entity identified by id (i.e. a user, a device, etc.) which is used as the hash key of a table, and an additional date attribute (i.e. last modified, last login etc). I would like to be able to find the eldest entry in that table.

A typical query would be "Find me the device with the oldest modification date" or "Find me the user with the most recent login".

I cannot wrap my head around a good design for this use case - be it auxiliary tables or GSI.

reikje
  • 2,850
  • 2
  • 24
  • 44

1 Answers1

1

It sounds like you have unique Items (users, devices, etc) that have a sortable time component (last modified, last login, etc). Your access patterns include fetching the oldest entry across all the items.

It sounds like you have a table that may look something like this:

User Items

Users Items are defined with a Partition Key of USER#<user_id> and the Sort Key is a date (could also include time of day, but I've simplified for this example).

If you are trying to search across multiple partitions, you only have access to the scan operation, which can make access patterns like you describe tricky (and expensive). Let's try a different approach.

Let's define a secondary index (GSI1) with a Partition Key (GSI1PK) of USERS and Sort Key (GSI1SK) as the time based attribute (updated_at, created_at, etc). That index would look like this:

User Items with GSI1

Lets look at the same table and data from the perspective of GSI1:

User Item Collection

Now I've got an Item Collections with a Partition Key of USERS and a date Sort Key. Since all the User data you want to search is now in a single partition, you can use the query operation to search across the entire USERS partition. You could further use the Sort Key to order the results (ScanIndexForward=False for descending order, ScanIndexForward=True for ascending order). You can set Limit=1 on your query to fetch the oldest (or latest) Item in that collection.

There are often many ways to implement a given access pattern in DynamoDB. I've seen this specific pattern used to implement the "fetch oldest/latest" access pattern.

Seth Geoghegan
  • 5,372
  • 2
  • 8
  • 23
  • How big is penalty for storing everything in the same partition - let's say I have 10M users? In the GSI1, what would happen if users log in at the exact same time? The index would use "users" as GSI1PK and the same sort key (GSI1SK) multiple times .. – reikje Sep 08 '20 at 20:21
  • The ideal use of DynamoDB distributes your data uniformly across partitions. Partition sizes are limited to 10GB, so you might be able to store all 10M users in a single partition. However, a different strategy may be warranted if 10M items in a collection is not evenly distributed in your application. The pattern I illustrated is generic to fetching an item sorted by date. The access patterns you gave as examples would likely be implemented differently, but could apply a similar pattern. I'm not sure what you mean with users logging in a the same time. – Seth Geoghegan Sep 08 '20 at 22:00
  • What I mean, imagine that in your original table, users will have the same sort key value (i.e. user3 and user4 both having 2020-09-08). Would this work with a GSI because both would be using the same index key ("users", 2020-09-08)? – reikje Sep 09 '20 at 05:57
  • @reikje That's a great question. DynamoDB enforces uniqueness via the Primary Key (e.g. the Partition Key and optional Sort Key) in your primary table. Secondary Indexes do not have the same uniqueness constraint. In your example, that means you could have multiple unique users in your main table with duplicate items in your secondary index. Therefore, you could have multiple items in your secondary index with a PK of USERS and SK of 2020-09-08. – Seth Geoghegan Sep 10 '20 at 17:40