1

Considering the following data

# SELECT * FROM foobar;
 id | quantity |    time
----+----------+------------
  1 |       50 | 2022-01-01
  2 |      100 | 2022-01-02
  3 |       50 | 2022-01-03
  4 |       50 | 2022-01-04

(note: the 4th row is simply here to illustrate the expected result, it's not necessary to reproduce the issue)

I want to extract three distinct groups based on two assumptions: 1) a new group should be created everytime quantity changes and 2) consecutive identical quantities should be merged in the same group. The end result should look something like this

 id | quantity |    time    |  group_id
----+----------+------------+-------------
  1 |       50 | 2022-01-01 |           1
  2 |      100 | 2022-01-02 |           2
  3 |       50 | 2022-01-03 |           3
  4 |       50 | 2022-01-04 |           3

I've tried implementing a gaps and island solution to this problem but it fails with this specific data, as the group end up merging rows with different quantity. Note that swapping row 2 and 3 fixes the issue. Using DENSE_RANK results in the same problem.

# SELECT
    *,
    ROW_NUMBER() OVER (ORDER BY time) as global_rank,
    ROW_NUMBER() OVER (PARTITION BY quantity ORDER BY time) as qty_counter
FROM foobar;
 id | quantity |    time    | global_rank | qty_counter
----+----------+------------+-------------+-------------
  1 |       50 | 2022-01-01 |           1 |           1    # global_rank - qty_counter = 0 
  2 |      100 | 2022-01-02 |           2 |           1    # global_rank - qty_counter = 1 
  3 |       50 | 2022-01-03 |           3 |           2    # global_rank - qty_counter = 1 
  4 |       50 | 2022-01-04 |           4 |           3    # global_rank - qty_counter = 1 
(4 rows)

How can I change this query to get the expected result ?

GMB
  • 216,147
  • 25
  • 84
  • 135
aguadoe
  • 150
  • 2
  • 9

1 Answers1

1

I find that this is simpler solved with lag(); the idea is to compare each quantity to the previous value, and increment a counter when they do not match.

select id, quantity, time, 
    count(*) filter(where quantity is distinct from lag_quantity) over(order by time) as grp
from (
    select f.*, lag(quantity) over(order by time) lag_quantity
    from foobar f
) f
order by time
id quantity time grp
1 50 2022-01-01 1
2 100 2022-01-02 2
3 50 2022-01-03 3
4 50 2022-01-04 3

fiddle

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Additional note for SQL beginners like me reading this post and not understanding the `count() over`, @GMB's answer works because `ORDER BY` performs a running total. See https://stackoverflow.com/a/14860368/5343009 for a better explanation – aguadoe Apr 20 '23 at 12:44