DynamoDB costs for Scan vs Query (with GSI which partition key is the same for the whole table) for filtering

Question

I have a DynamoDB table which looks like this:

[id] [datetime] [name] [reasonForVisit] [__typename]

[id] is the simple primary key for the table

[__typename] is an attribute that has the same value for all items in the table

The table has become very big. I want to be able to filter the data by range in terms of datetime. I have 2 quickly implementable options on my mind, but am unsure if it would make much of a difference in terms of costs.

SCAN the whole table and then filter by datetime (as dynamodb doesn't allow filtering before scan)
Create a GSI (partitionKey: __typename, sortKey: datetime) and QUERY by the fixed __typename partition and filter by datetime sort key e.g. between 10/8/10 and 10/11/10.

So, my question is, since my partition key will be the same for each item, hence one big partition, I'm not sure if when QUERYing with a filter, does it still result in reading the whole table (similar to SCAN) or does it know to efficiently start reading from an item based on the filter?

score 0 · Answer 1 · answered Sep 09 '20 at 14:39

Filter happens always after all read operations. There is no difference between Scan and Query in that case. Also, if you create GSI with same PK for all elements you can hit "hot partition" problem and additionally you will slow down your writes.

In case you looking for 80% of records in table a scan+filter solution can be good fit. Otherwise you need to make use of querying capabilities Dynamo offers (so querying by PK and SK). What you could do is to introduce second record like:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id

or:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id_list

or if you data is immutable (historical) and won't ever change then:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id, name, reasonForVisit, __typename

Querying for given range would be done using BatchGetItem. In first and second option you would need to make two writes when adding new record using BatchWriteItem (or TransactWriteItem if strict consistency matters). For third option you can instead use GSI on id.

score 0 · Answer 2 · answered Aug 06 '22 at 18:36

Let's say that you want to find out all your users by a date range.

First you want to sort by date all users:

type User @model @count @auth(rules: [{allow: public}]) {
    id: ID!
    username: String!
    fullname: String
    email: AWSEmail!
    #...
    # At the end, add this:
    createdAt: String!
    sortType: String @index(name: "userByDate", queryField: "userByDate", sortKeyFields: ["createdAt"]) @default(value: "User")
}

Using this solution you'll be able to sort all your entries by ASC or DESC (using sortDirection). Of course, this makes lot easier to retrieve data by range datetime, because you don't have to scan for all your entries in table.

Guidelines

Every time you create a user you have to add this field + value to your JSON:

sortType: User

After you can use in query this filter:

sortType: "User" sortDirection: ASC

Source

DynamoDB costs for Scan vs Query (with GSI which partition key is the same for the whole table) for filtering

2 Answers2