-1

Hey folks maybe somebody has a clue on that. I have a table in a format like this:

id          timestamp           status value 
82240589    2020-03-01 09:13:46 70     22.00
82240589    2020-03-01 09:13:57 70     34.00
82240589    2020-03-01 09:14:14 70     21.00
82240589    2020-03-01 09:14:22 70     47.00
82240589    2020-03-01 09:14:33 70     32.00
82240589    2020-03-01 09:14:43 83     37.00
82240589    2020-03-01 09:14:52 83     44.00
82240589    2020-03-01 09:15:01 83     39.00
82240589    2020-03-01 09:15:10 70     40.00
82240589    2020-03-01 09:15:19 70     40.00
82240589    2020-03-01 09:16:30 70      5.00
82240589    2020-03-01 09:16:37 70     43.00
82240589    2020-03-01 09:16:46 70     46.00
82240589    2020-03-01 09:16:53 70     53.00
82240589    2020-03-01 09:17:00 70     55.00
82240589    2020-03-01 09:17:08 70     50.00
82240589    2020-03-01 09:17:16 70     46.00
82240589    2020-03-01 09:17:52 70     10.00

I need to aggregate the output based on the id and the status change. In addition I need to calculate for example a sum over all values in the period. So for example the output looks like that:

id          timestamp_start         timestamp_end               status sum_value
82240589    2020-03-01 09:13:46     2020-03-01 09:14:33         70     ####
82240589    2020-03-01 09:14:43     2020-03-01 09:15:01         83     ####
82240589    2020-03-01 09:15:10     2020-03-01 09:17:52         70     ####

1 Answers1

0

This is a problem.

select id, 
       min("timestamp") as start_at, 
       max("timestamp") as end_at,
       status,
       sum(value)
from ( 
  select id, "timestamp", status, value, 
         group_flag, 
         sum(group_flag) over (order by "timestamp") as group_nr
  from (
    select *, 
           case 
             when lag(status,1,status) over (partition by id order by "timestamp") = status then 0
             else 1
           end as group_flag
    from data
    order by id, "timestamp"
  ) t1
) t2
group by group_nr, status, id
order by id, start_at

So the inner-most query creates a flag that flips from 0 to 1 whenever the status changes (for the same id value).

For the given data, the result of that is:

id       | timestamp           | status | value | group_flag
---------+---------------------+--------+-------+-----------
82240589 | 2020-03-01 09:13:46 |     70 | 22.00 |          0
82240589 | 2020-03-01 09:13:57 |     70 | 34.00 |          0
82240589 | 2020-03-01 09:14:14 |     70 | 21.00 |          0
82240589 | 2020-03-01 09:14:22 |     70 | 47.00 |          0
82240589 | 2020-03-01 09:14:33 |     70 | 32.00 |          0
82240589 | 2020-03-01 09:14:43 |     83 | 37.00 |          1
82240589 | 2020-03-01 09:14:52 |     83 | 44.00 |          0
82240589 | 2020-03-01 09:15:01 |     83 | 39.00 |          0
82240589 | 2020-03-01 09:15:10 |     70 | 40.00 |          1
82240589 | 2020-03-01 09:15:19 |     70 | 40.00 |          0
82240589 | 2020-03-01 09:16:30 |     70 |  5.00 |          0
82240589 | 2020-03-01 09:16:37 |     70 | 43.00 |          0
82240589 | 2020-03-01 09:16:46 |     70 | 46.00 |          0
82240589 | 2020-03-01 09:16:53 |     70 | 53.00 |          0
82240589 | 2020-03-01 09:17:00 |     70 | 55.00 |          0
82240589 | 2020-03-01 09:17:08 |     70 | 50.00 |          0
82240589 | 2020-03-01 09:17:16 |     70 | 46.00 |          0
82240589 | 2020-03-01 09:17:52 |     70 | 10.00 |          0

The next level then creates groups based on that flag. For the given data, the result of that is:

id       | timestamp           | status | value | group_nr
---------+---------------------+--------+-------+---------
82240589 | 2020-03-01 09:13:46 |     70 | 22.00 |        0
82240589 | 2020-03-01 09:13:57 |     70 | 34.00 |        0
82240589 | 2020-03-01 09:14:14 |     70 | 21.00 |        0
82240589 | 2020-03-01 09:14:22 |     70 | 47.00 |        0
82240589 | 2020-03-01 09:14:33 |     70 | 32.00 |        0
82240589 | 2020-03-01 09:14:43 |     83 | 37.00 |        1
82240589 | 2020-03-01 09:14:52 |     83 | 44.00 |        1
82240589 | 2020-03-01 09:15:01 |     83 | 39.00 |        1
82240589 | 2020-03-01 09:15:10 |     70 | 40.00 |        2
82240589 | 2020-03-01 09:15:19 |     70 | 40.00 |        2
82240589 | 2020-03-01 09:16:30 |     70 |  5.00 |        2
82240589 | 2020-03-01 09:16:37 |     70 | 43.00 |        2
82240589 | 2020-03-01 09:16:46 |     70 | 46.00 |        2
82240589 | 2020-03-01 09:16:53 |     70 | 53.00 |        2
82240589 | 2020-03-01 09:17:00 |     70 | 55.00 |        2
82240589 | 2020-03-01 09:17:08 |     70 | 50.00 |        2
82240589 | 2020-03-01 09:17:16 |     70 | 46.00 |        2
82240589 | 2020-03-01 09:17:52 |     70 | 10.00 |        2

As we can see, the different "groups" that result in the status flag now have a unique number which can be used for grouping/aggregating which is then done in the outer-most query.

The nesting of the queries is necessary, because you can't nest window function calls.

Online example