How do you best structure your DynamoDb for multiple query parameters

Question

In context of a tracker system, I have a situation, where the user's device deliver location data to backend and the system subsequently queries that data both per user and in bulk. The structure of the data is as follows:

{"user_id": "user_1", "timestamp": "2020-10-31 07:05:10.153777+00:00", "location": "XYZ", "details": "PQR"}

The queries that we need are:

Get all location and details data for X<timestamp<Y

and

Get all location and details data for user_id=P and X<timestamp<Y

The total size of database would be around 10 TB I am a DynamoDb newbie, and am not sure I understand the concept of partitionKey very well. Currently I would plan to use a table with partitionKey as user_id and rangekey as timestamp, and then create a secondary global index with "day" out of timestamp for satisfying the first query.

Does anybody have advice about how should the DynamoDb be structured for best scaling and performance?
Does anybody have any advice/criticism about the currently suggested structure?

score 0 · Answer 1 · answered Nov 13 '21 at 00:33

I would plan to use a table with partitionKey as user_id and rangekey as timestamp

I think that's a good structure for satisfying your second query. You could specify a user, then filter by the desired date/time range.

For your first query, trying to request X<timestamp<Y might give you trouble. Take a look at this page on constructing a Key Condition Expression:

You must specify the partition key name and value as an equality condition.

In other words, even if you build a GSI on the "day" portion of the timestamp, I'm not aware of a way to do a X<timestamp<Y query directly - the name of a single partition must be given.

Based on what you've said, you could still use a GSI indexed on the "day" portion of your timestamp and query it sequentially, a day at a time.

This is sort of the idea behind write sharding, where you explicitly are controlling the number of partitions in your GSI to allow for direct querying. In your case, creating a GSI indexed on the "day" would give you one partition per day that can be queried directly using an = operator, as is required by dynamodb.

How do you best structure your DynamoDb for multiple query parameters

1 Answers1