query dynamoDB for the latest entry of every hash key

Question

I have a dynamoDB table where I store sensor events.

HASH KEY: sensor id
RANGE KEY: timestamp
sensor info

I now need a query for the latest event of every sensor.

The only solution I can come up with is to query the latest event for each sensor id. But that would be a lot of queries with 2000+ sensors. I don't want to scan the whole table to sort it out afterwards either since the table can grow quite fast.

Any ideas?

Mike Dinescu · Accepted Answer · 2018-08-25T07:19:14.460

You have to decide what is important to you and design your table(s) to match your use cases.

You say you want to query the last value for every sensor and that there are 2000+ sensors. What will you do with these 2000+ values? How often do you need these values and can the values be slightly out of date?

One solution would be to have two tables: one where you append historical values (time series data) and another table where you always update the most recent reading for each sensor. When you need the most recent sensor data, just scan this second table to get all your sensors’ most recent values. It's as efficient as it gets for reads. For writes, it means you have to write twice for each sensor update.

The other potential solution would be to write your time series data partitioned by time, as opposed to the sensor ids. Assuming all sensors are updated at each time point, with a single query you can get the value of all sensors. This works but only if you update the vales of all sensors every time, and only if you do it with regular cadence.

However, if you update all sensors at once, then further optimizations may be had by combining multiple sensor readings into a single item, therefore requiring less writes to update all 2000 of them.

Well I don't need these queries often. I want to initialize a visualization that shows the states of thousands of objects and sensors. So i need the current state of all objects at once. Since the data comes in asynchronously I also can't query for a time slot or time range. But I also thought about the two table version and this is what I am implementing right now. — Matze Peng, Aug 25 '18 at 12:11
Glad it worked out. The reason I asked about how you’ll use the 2000+ values is because visualizing 2000 distinct sensors all at once can be pretty challenging. — Mike Dinescu, Aug 25 '18 at 15:34

score 1 · Answer 2 · edited Jul 31 '20 at 01:34

1

Since you have a range_key defined you can get the latest item using

Query(hash_key=HASH_KEY, ScanIndexForward=True, Limit=1)

edited Jul 31 '20 at 01:34

Gricey

1,321
1
18
38

answered Aug 24 '18 at 16:29

k.wahome

962
5
14

1

This is the solution OP says they already have and doesn't scale well since it's a request per hash key – Gricey Jul 31 '20 at 00:03
In case anyone else struggles with the answer from k.wahome the `L` in `limit` must be capital, so `Query(hash_key=HASH_KEY, ScanIndexForward=True, Limit=1)` Works! – RNE Jan 03 '20 at 13:41

query dynamoDB for the latest entry of every hash key

2 Answers2