What's the best approach to get items from Dynamodb using query vs get item one by one using getItem?

Question

I am trying to build an application for agents who can have multiple locations. Below is what my data would look like.

| Partition Key   | SortKey    | AgentName | LocationAddress |
--------------------------------------------------------------
| Agent1          | Agent1     | AgentName |                 |
| Agent1          | Location#1 |           | 123 MainStreet..|
| Agent1          | Location#2 |           | 1 MainStreet..  |
| Agent1          | Location#3 |           | 12 MainStreet.. |

I am expecting no more than 20 locations for each agent I am storing.

My use cases are the following

Match each location with an external list (The external list will likely have all the stored location. Business use case is to validate the external list data against the database.)

Access pattern options:

GetAll using PartitionKey="Agent1" and SortKey BEGINS_WITH="Location"
GetItem using PartitionKey="Agent1" and EQ="Location#1"
- According to StackOverflow question, GetItem maybe a better choice having PartitionKey and SortKey here

In comparison, the link suggests that

The latency of GetItem vs Query with limit=1 will be equivalent.

You'd think the Query will be better in the case you need N number of locations in one go. The latency would 1 in that vs. latency would be N number of locations for GetItem.

After doing all my research I think that it might be better to take option #2 because of the throughput and knowing you'll need all data anyways.

Querying the DB once would be 1 throughput vs GetItem with throughput one every time you get the item. I would like to discuss if option #1 is better than option #2.

David Good · Answer 1 · 2020-10-28T00:28:23.300

For a GetItem request, you must specify the full values of the primary key: the partition key and the sort key if your table has one. So if you need to fetch N locations, you will need to perform N GetItem requests.

Making N GetItem requests to fetch N Locations will be N times slower (worst case) than making a single Query request, and it's not really how DynamoDB is intended to be used. The best practice is to model your data so that you can fetch all the data required for a given access pattern in a single request. Looking at your data model, you have already modeled your data in this way with a single item collection containing both the Agent and many associated Locations.

With the Query operation, you can fetch multiple items, and you must provide the partition key. The sort key is optional but it supports the comparisons operators (less than, begins with, between, etc.). This is exactly what you have described: PartitionKey="Agent1" and SortKey BEGINS_WITH="Location".

Furthermore, using N GetItem operations will consume extra Read Capacity Units (RCUs) since each operation will be rounded up to a minimum 1 RCU (or 0.5 RCU for eventually consistent reads). By comparison, a Query's consumed capacity is calculated based on the total size of the items read. (Thanks to Nadav for his correction to this in the comments!)

So I can't think of a good reason why you would choose Option 2 (N GetItem requests) over Option 1 (a single Query operation).

I would just like to comment that the last paragraph, about RCUs, is not accurate: For small items, Query can be significantly cheaper than a GetItem of the individual items. With GetItem, the item size is rounded up to 4KB - reading a 1KB item also costs one RCU. But for Query, the **total** read size is divided by 4KB. So if you have 1KB items, you can query 100 of them for 25 RCU, not 100 RCU. So I'm strengthening David's recommendation further: Go with Query, not GetItem, for this use case. — Nadav Har'El, Oct 27 '20 at 14:15

What's the best approach to get items from Dynamodb using query vs get item one by one using getItem?

1 Answers1