SQL Gaps and Islands fails with 3 rows

Question

Considering the following data

# SELECT * FROM foobar;
 id | quantity |    time
----+----------+------------
  1 |       50 | 2022-01-01
  2 |      100 | 2022-01-02
  3 |       50 | 2022-01-03
  4 |       50 | 2022-01-04

(note: the 4th row is simply here to illustrate the expected result, it's not necessary to reproduce the issue)

I want to extract three distinct groups based on two assumptions: 1) a new group should be created everytime quantity changes and 2) consecutive identical quantities should be merged in the same group. The end result should look something like this

 id | quantity |    time    |  group_id
----+----------+------------+-------------
  1 |       50 | 2022-01-01 |           1
  2 |      100 | 2022-01-02 |           2
  3 |       50 | 2022-01-03 |           3
  4 |       50 | 2022-01-04 |           3

I've tried implementing a gaps and island solution to this problem but it fails with this specific data, as the group end up merging rows with different quantity. Note that swapping row 2 and 3 fixes the issue. Using DENSE_RANK results in the same problem.

# SELECT
    *,
    ROW_NUMBER() OVER (ORDER BY time) as global_rank,
    ROW_NUMBER() OVER (PARTITION BY quantity ORDER BY time) as qty_counter
FROM foobar;
 id | quantity |    time    | global_rank | qty_counter
----+----------+------------+-------------+-------------
  1 |       50 | 2022-01-01 |           1 |           1    # global_rank - qty_counter = 0 
  2 |      100 | 2022-01-02 |           2 |           1    # global_rank - qty_counter = 1 
  3 |       50 | 2022-01-03 |           3 |           2    # global_rank - qty_counter = 1 
  4 |       50 | 2022-01-04 |           4 |           3    # global_rank - qty_counter = 1 
(4 rows)

How can I change this query to get the expected result ?

score 1 · Accepted Answer · answered Apr 20 '23 at 12:06

1

I find that this is simpler solved with lag(); the idea is to compare each quantity to the previous value, and increment a counter when they do not match.

select id, quantity, time, 
    count(*) filter(where quantity is distinct from lag_quantity) over(order by time) as grp
from (
    select f.*, lag(quantity) over(order by time) lag_quantity
    from foobar f
) f
order by time

id	quantity	time	grp
1	50	2022-01-01	1
2	100	2022-01-02	2
3	50	2022-01-03	3
4	50	2022-01-04	3

fiddle

answered Apr 20 '23 at 12:06

GMB

216,147
25
84
135

Additional note for SQL beginners like me reading this post and not understanding the `count() over`, @GMB's answer works because `ORDER BY` performs a running total. See https://stackoverflow.com/a/14860368/5343009 for a better explanation – aguadoe Apr 20 '23 at 12:44

SQL Gaps and Islands fails with 3 rows

1 Answers1