0

I have a table where I store multiple observations over time of social media posts on a set of web pages. The table columns are page_id, post_id, and obs_date [plus others with my interesting data]. I want to query this table for the latest observation of each post_id for a given page_id. In SQL this would be something like

SELECT * FROM my_table 
JOIN 
(SELECT max(obs_date) AS obs_date, post_id FROM my_table 
  WHERE page_id='12345' GROUP BY post_id) AS t 
ON t.obs_date=my_table.obs_date AND t.post_id=my_table.post_id

I have seen recipes for getting the latest item for a hash key in questions like this, but here I am wondering if it possible to take this one step further and get the latest item for a given hash key, and then for each of another attribute a la 'GROUP BY'. How would I set up my table(s) and keys to do this? Could I use Local or Global Secondary Indexes?

Any help or suggestion is greatly appreciated.

Community
  • 1
  • 1
rjgage
  • 53
  • 1
  • 5

1 Answers1

0

This can be achieved by Local secondary Index

Firstly your primary table will look like

   page_id - hash
   post_id - range 
   other attributes

Then you can create LSI which has

   page_id - hash
   date_time - Range
   other attributes you need

now when you query LSI based on given page_id you will get all data by ascending/Descending order based on your date (scanIndexForward -false will give in descending).

There is similar example in documentation :

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html

Hope This will help

Thanks

Harshal Bulsara
  • 7,898
  • 4
  • 43
  • 61
  • In this solution, I can't store more than one observation per post; the idea is that I want to keep a full timeseries for other reasons but still be able to efficiently query for the latest. – rjgage Mar 18 '14 at 14:10
  • are you saying that at same time multiple observation is not stored, then you are not correct. – Harshal Bulsara Mar 19 '14 at 05:17
  • If I use page_id and post_id as the key, then I will only be able to store a single observation for a given page and post- I'm looking to store a timeseries of such observations while still being able to query for the latest one efficiently given a page_id – rjgage Mar 21 '14 at 16:03