How do I group by values, but only when they're consecutive?

Question

I have the following table:

ID	Number	Size
7	1	1.5
7	2	1.5
8	1	1.625
8	2	1.03125
8	3	1.03125
8	4	1.03125
8	5	1.625
8	6	1
8	7	1.625
8	8	1.625
8	9	1.625
9	1	1
9	2	2
9	3	3
9	4	4
9	5	1

I would like to create a column with a single string value for each ID that groups the size, but only when the values are consecutive based on the number, and includes the number of matching sizes.

For example, for ID=8, I would like something like this:

1.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3)

Please note that the number could increment to any integer value and the size could be any positive number, up to 7 decimal places.

As I've been thinking through it, these are the steps I've considered so far:

Group the sizes when there are equal consecutive values. I've considered using the LAG()/LEAD() functions to accomplish this or the CASE expression. Perhaps there is some GROUP BY/subquery magic that could be performed to accomplish this as well? Regardless, this step would return something like this, where the "Group" column is anything that could be used to group the values within the same ID:

ID	Number	Size	Group
7	1	1.5	1
7	2	1.5	1
8	1	1.625	1
8	2	1.03125	2
8	3	1.03125	2
8	4	1.03125	2
8	5	1.625	3
8	6	1	4
8	7	1.625	5
8	8	1.625	5
8	9	1.625	5
9	1	1	1
9	2	2	2
9	3	3	3
9	4	4	4
9	5	1	5

Then, once I've created a proper Group column, I was thinking I'd next group the sizes and create an additional column for the quantity in that group. This would look something like this:

ID	Number	Size	Group	Quantity
7	1	1.5	1	2
8	1	1.625	1	1
8	2	1.03125	2	3
8	5	1.625	3	1
8	6	1	4	1
8	7	1.625	5	3
9	1	1	1	1
9	2	2	2	1
9	3	3	3	1
9	4	4	4	1
9	5	1	5	1

Finally, potentially using a combination of the STRING_AGG() and CONCAT() functions, I would put the data into the following format:

ID	Size
7	1.5 (x2)
8	1.625 (x1) x 1.03125 (x3) x 1.625 (x1) x 1 (x1) x 1.625 (x3)
9	1 (x1) x 2 (x1) x 3 (x1) x 4 (x1) x 1 (x1)

And, preferably, I would omit the " (x1)", such that the final form looks like this:

ID	Size
7	1.5 (x2)
8	1.625 x 1.03125 (x3) x 1.625 x 1 x 1.625 (x3)
9	1 x 2 x 3 x 4 x 1

It's kind of a lot. But in the end, I'm hoping to turn the data from these tables into a more readable format for users, but 1) I wasn't exactly sure how to do this, and 2) Given potentially multiple routes for accomplishing this, I wasn't sure what the most performant option was.

Any thoughts?

P.S. I am open to any methods, it certainly doesn't have to follow the train of thought I included here.

*Edited to add a new dataset (ID=9)

[Why should I tag my RDBMS](https://meta.stackoverflow.com/questions/388759) — Stu, Apr 05 '23 at 21:05
@Stu Thank you for your comment. I am relatively new to database management as well as Stack Overflow, so I appreciate this link + suggestion. I have tried to update my tags to be more specific, but if there is still room for improvement there, I'm always open to feedback! — Leah, Apr 06 '23 at 11:39

score 1 · Accepted Answer · answered Apr 05 '23 at 22:11

You have a gaps and islands problem, one option is to use a difference between two row_numbers to define the required groups, try the following using SQL Server syntax:

with cte1 as -- Step 1: using the difference between two row_numbers approach, create groups for consecutive similar values of Size.
(
  select *,
   row_number() over (partition by id order by number) - 
   row_number() over (partition by id, size order by number) grp
  from table_name 
),
cte2 as -- Step 2: get the counts for each group defined in the previous step
(
  select id, min(number) number, size, grp, count(*) cnt
  from cte1
  group by id, size, grp
)
  -- Step 3: use string_agg and concat functions to get the desired format
select id, 
  string_agg
    (-- use a case expression to not include (1x) when count = 1
      case when cnt > 1 then concat(size, ' (x', cnt, ')') else cast(size as varchar(20)) end, ' x '
    ) within group (order by number) Size
from cte2
group by id
order by id

See demo

This appears to work perfectly, thank you! I had considered the method you used for Step 1 based on an answer to a similar question I saw on this site; however, I didn't think it'd work for the case demonstrated with ID=9 (see updated post). This is because, when you assign the "grp" value to this dataset, both Numbers 4 & 5 of ID=9 get the same grp value. However, this will always be a non-issue so long as you're grouping by first Size then Number, correct? — Leah, Apr 06 '23 at 12:38
Additionally, just for learning purposes, why did you use CAST instead of CONVERT in Step 3? — Leah, Apr 06 '23 at 12:40
You could use convert instead of cast, cast is ANSI standard while convert is tsql specific. — ahmed, Apr 08 '23 at 13:31

How do I group by values, but only when they're consecutive?

1 Answers1