0

I have a postgres 9.6 table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585' with fractions of a second. It might be none to dozens record each second.

I can find MAX and MIN price record in some time period. I can quite easily select a period using

SELECT date_trunc('second', dt) as time, min(price), max(price)
FROM prices
WHERE dt >= '2017-05-01 00:00:00' AND dt < '2017-05-01 00:00:59'
GROUP BY time
ORDER BY time;

But date_trunc does not have flexibility and does not allow to set arbitrary period, for example 5 seconds, or 10 minutes. Is there a way to solve it?

uzla
  • 515
  • 1
  • 4
  • 20

1 Answers1

1

Use generate_series to get the ranges on the interval of time you need to search. Then use dd + '5 seconds'::interval to get the upper bound of the range

In this example we look for one day of data every 5 seconds

WITH ranges as (
    SELECT dd as start_range, 
           dd + '5 seconds'::interval as end_range, 
           ROW_NUMBER() over () as grp
    FROM generate_series
            ( '2017-05-01 00:00:00'::timestamp 
            , '2017-05-02 00:00:00'::timestamp
            , '5 seconds'::interval) dd
), create_grp as (
    SELECT r.grp, r.start_range, r.end_range, p.price
    FROM prices p
    JOIN ranges r
      ON p.date >= r.start_range
     AND p.date < r.end_range
)        
SELECT grp, start_range, end_range, MIN(price), MAX(price)
FROM create_grp
GROUP BY grp, start_range, end_range
ORDER BY grp
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • Thanks for the answer! It works! Looks like an SQL PhD answer! :-) I'm quite new to SQL. Would you mind to explain how it works? – uzla Sep 17 '17 at 14:16
  • What part? Try `SELECT * FROM ranges` so you see the result – Juan Carlos Oropeza Sep 17 '17 at 14:18
  • More like what and where I should add, in order to see what time range Min and Max values correspond to? :-) – uzla Sep 17 '17 at 14:39
  • Try again, I copy paste from above and there isnt alias `r` below – Juan Carlos Oropeza Sep 18 '17 at 14:38
  • where you put the `WHERE` ?? try with a composite index `(unit, date)` – Juan Carlos Oropeza Sep 21 '17 at 14:12
  • Sorry I just read you already have composite index. I need see the query and the [explain plan](https://stackoverflow.com/questions/12915209/how-to-understand-an-explain-analyze) – Juan Carlos Oropeza Sep 21 '17 at 14:13
  • Although, as soon as I add WHERE clause to it, performance drops to unbearable. Table has uid, unit (effectively product_id), price, dt (date). I add WHERE unit='XXXX' after JOIN :-( I have indexes on 'unit' and 'unit and date' . without WHERE clause, it flies, with the clause, is wait-forever execution. Any ideas? (one sec.) – uzla Sep 21 '17 at 14:15
  • You just wrote the same message I need the query and the explain analyze result. – Juan Carlos Oropeza Sep 21 '17 at 14:17
  • query does not fit in the comment characters limit :-/ I will split it on two comments. – uzla Sep 21 '17 at 14:23
  • WITH ranges as ( SELECT dd as start_range, dd + '5 seconds'::interval as end_range, ROW_NUMBER() over () as grp FROM generate_series ( '2017-05-01 00:00:00'::timestamp , '2017-05-02 00:00:00'::timestamp , '5 seconds'::interval) dd ), create_grp as ( SELECT r.grp, r.start_range, r.end_range, p.price FROM prices p JOIN ranges r ON p.date >= r.start_range AND p.date < r.end_range WHERE unit='AABBCC' ) – uzla Sep 21 '17 at 14:24
  • SELECT grp, start_range, end_range, MIN(price), MAX(price) FROM create_grp GROUP BY grp, start_range, end_range ORDER BY grp – uzla Sep 21 '17 at 14:25
  • You could edit the question ;P Also show me your create index. – Juan Carlos Oropeza Sep 21 '17 at 14:27
  • adding WHERE unit='AABBCC' makes the query runs forever. How can I optimize it to the original speed? (without WHERE clause was super quick) I'm grabbing the Explaing resulting before modifications and after for you. – uzla Sep 21 '17 at 14:31
  • CREATE INDEX idx_dt_instrument ON public.prices USING btree (date, unit COLLATE pg_catalog."default") TABLESPACE pg_default; – uzla Sep 21 '17 at 14:33
  • try creating another index but invert the fields to `(unit, date)` and try again. Also is the unit already in the same collate? otherwise that would cause some problem. – Juan Carlos Oropeza Sep 21 '17 at 14:36
  • Done. CREATE INDEX idx_instrument_dt ON public.prices USING btree (instrument COLLATE pg_catalog."default", dt) TABLESPACE pg_default; Same story: indefinite query. I can post explain before and after (without new index) in 4 posts. (stackoverflow message chars limit) – uzla Sep 21 '17 at 14:53
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155009/discussion-between-juan-carlos-oropeza-and-uzla). – Juan Carlos Oropeza Sep 21 '17 at 14:54
  • Before: GroupAggregate (cost=2909165506.27..3006889539.60 rows=8000000 width=40) Group Key: create_grp.grp, create_grp.start_range, create_grp.end_range CTE ranges -> WindowAgg (cost=0.00..25.00 rows=1000 width=24) -> Function Scan on generate_series dd (cost=0.00..10.00 rows=1000 width=8) CTE create_grp -> Nested Loop (cost=117649.05..827926115.50 rows=6509602222 width=47) -> CTE Scan on ranges r (cost=0.00..20.00 rows=1000 width=24) -> Bitmap Heap Scan on prices p (cost=117649.05..762830.08 rows=6509602 width=31) – uzla Sep 21 '17 at 14:59
  • Recheck Cond: ((dt >= r.start_range) AND (dt < r.end_range)) -> Bitmap Index Scan on idx_dt_instrument (cost=0.00..116021.65 rows=6509602 width=0) Index Cond: ((dt >= r.start_range) AND (dt < r.end_range)) -> Sort (cost=2081239365.77..2097513371.32 rows=6509602222 width=40) Sort Key: create_grp.grp, create_grp.start_range, create_grp.end_rage -> CTE Scan on create_grp (cost=0.00..130192044.44 rows=6509602222 width=40) – uzla Sep 21 '17 at 14:59
  • After adding WHERE (before new index): GroupAggregate (cost=1029283508.90..1060020480.57 rows=8000000 width=40) Group Key: create_grp.grp, create_grp.start_range, create_grp.end_range CTE ranges -> WindowAgg (cost=0.00..25.00 rows=1000 width=24) -> Function Scan on generate_series dd (cost=0.00..10.00 rows=1000 width=8) CTE create_grp -> Nested Loop (cost=344299.48..448807971.73 rows=2043798111 width=47) Join Filter: ((p.dt >= r.start_range) AND (p.dt < r.end_range)) -> CTE Scan on ranges r (cost=0.00..20.00 rows=1000 width=24) – uzla Sep 21 '17 at 15:00
  • -> Materialize (cost=344299.48..1339476.69 rows=18394183 width=31) -> Bitmap Heap Scan on prices p (cost=344299.48..1121763.77 rows=18394183 width=31) Recheck Cond: ((unit)::text = 'AABBCC'::text) -> Bitmap Index Scan on idx_instrument (cost=0.00..339700.94 rows=18394183 width=0) Index Cond: ((unit)::text = 'AABBCC'::text) -> Sort (cost=580475512.17..585585007.45 rows=2043798111 width=40) – uzla Sep 21 '17 at 15:00
  • Sort Key: create_grp.grp, create_grp.start_range, create_grp.end_range -> CTE Scan on create_grp (cost=0.00..40875962.22 rows=2043798111 width=40) – uzla Sep 21 '17 at 15:01
  • Man, I'd rather discus it privatively and post result here. This comment system is 'far from ideal' to say the least... – uzla Sep 21 '17 at 15:02