1

I have a table like this

+------+------+-------+--+
| Name | Date | Price |  |
+------+------+-------+--+
| x    | d1   |    50 |  |
| x    | d2   |    45 |  |
| x    | d3   |    55 |  |
| x    | d4   |    40 |  |
| x    | d5   |    48 |  |
| x    | d6   |    45 |  |
| x    | d7   |    44 |  |
| y    | d1   |   200 |  |
| y    | d2   |   205 |  |
| y    | d3   |   204 |  |
| y    | d4   |   189 |  |
| y    | d5   |   185 |  |
| y    | d6   |   187 |  |
| y    | d7   |   182 |  |
+------+------+-------+--+

I want to find high price for each name and date in next 3 days.

OutPut:

+------+------+-------+---------+-----------+--+
| Name | Date | Price | High_pr | High_date |  |
+------+------+-------+---------+-----------+--+
| x    | d1   |    50 |      55 | d3        |  |
| x    | d2   |    45 |      55 | d3        |  |
| x    | d3   |    55 |      55 | d3        |  |
| x    | d4   |    40 |      48 | d5        |  |
| x    | d5   |    48 |      48 | d5        |  |
| x    | d6   |    45 |      45 | d6        |  |
| x    | d7   |    44 |      44 | d7        |  |
| y    | d1   |   200 |     205 | d2        |  |
| y    | d2   |   205 |     205 | d2        |  |
| y    | d3   |   204 |     204 | d3        |  |
| y    | d4   |   189 |     189 | d4        |  |
| y    | d5   |   185 |     187 | d6        |  |
| y    | d6   |   187 |     187 | d6        |  |
| y    | d7   |   182 |     182 | d7        |  |
+------+------+-------+---------+-----------+--+

Tried using window function with following '3 day' but not working.

The table is really big with many names spanning 5 years of data, need some optimum solution. Thank You

PS: Using PostgreSQL 12.4.

GMB
  • 216,147
  • 25
  • 84
  • 135
Shh
  • 986
  • 9
  • 18

2 Answers2

1

This might be simpler to do with a lateral join:

select t.*, h.*
from mytable t
cross join lateral (
    select price high_price, date high_date
    from mytable t1
    where t1.name = t.name and t1.date >= t.date and t1.date <= t.date + interval '3 day'
    order by t1.price desc
    limit 1
) h
GMB
  • 216,147
  • 25
  • 84
  • 135
  • price and high price are always same in this sql which is not correct – Shh Oct 10 '20 at 01:43
  • @Shh: that is suprising. Here is a db-fiddle based on your sample data (I just replaced the fake dates with actual dates), where you can see that the result is what you expect: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=fd9fa501e3e2fd1c958ade76fdb82a40 – GMB Oct 10 '20 at 01:44
  • Yes it's all right in the fiddle, let me check at my end. Actually whatever I tried, I was getting the same output ie price and high_price same. – Shh Oct 10 '20 at 02:15
  • My mistake in typing.. @Gordon Linoff is really fast on a very large table... – Shh Oct 10 '20 at 02:20
1

I am wondering if this can be done reasonably without the use of a lateral join. Window functions don't seem to be optimal, but the following does work using a correlated subquery:

select t.name, t.date, t.price, max_price_3,
       (select pd.date
        from unnest(t.pd) pd (date date, price int)
        where pd.price = t.max_price_3
        limit 1
       ) as date_3
from (select t.*,
             max( price ) over (partition by name order by date range between current row and interval '3 day' following) as max_price_3,
             array_agg( (date, price) ) over (partition by name order by date range between current row and interval '3 day' following) as pd
      from t
     ) t ;

Here is a db<>fiddle.

There is a way to do this using only window functions . . . but using generate_series(). This is a little complicated to follow, because the original date/price is spread backwards through the days in the period -- then the maximum row is taken:

select name, dte, price, price_3, date as date_3
from (select t.*, dte, min(date) over (partition by name) as min_date,
             max(t.price) over (partition by t.name, gs.dte) as price_3,
             row_number() over (partition by t.name, gs.dte order by price desc) as seqnum
      from t cross join
           generate_series(date, date - interval '3 day', interval '-1 day') gs(dte)
     ) t
where seqnum = 1 and dte >= min_date
order by t.name, t.date;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Your first example works like a charm and it's super fast. I was using first_value then ordering it as I wanted to. But there was some flaw in that. Still trying to work out what was wrong. Will compare the use of array_agg and why it's necessary. Plus that has the advantage of using a number of variables ie I can use 10 days, 20 days in single pass. Thanks – Shh Oct 10 '20 at 03:27