0

I'm working on a project where I need to get samples of hourly data using timestream. I've been using this query:

select *
FROM table_name 
WHERE time >= from_iso8601_timestamp('2022-10-11T11:31:51') 
  and time <= from_iso8601_timestamp('2022-10-11T12:31:51') 
order by random(<some large number>)
limit 1000

This gives me a set of fairly random rows, but I noticed that it queries the entire hour and only then returns the rows. Since you pay for GBs scanned this is less than ideal. I've also tried not using the random function and just limiting the size of the query. Although it decreases the GBs scanned, The results ended up not being sufficiently random.

How do I get a random sample without running an expensive query?

1 Answers1

0

seems you are looking for scheduled query. https://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries.html