6

I have a problem, that I want to partition over a sorted table. Is there a way I can do that?

I am using SQL Server 2016.

Input Table:

|---------|-----------------|-----------|------------|
|  prod   |   sortcolumn    |    type   |    value   |
|---------|-----------------|-----------|------------|
|    X    |        1        |     P     |     12     |
|    X    |        2        |     P     |     23     |
|    X    |        3        |     E     |     34     |
|    X    |        4        |     P     |     45     |
|    X    |        5        |     E     |     56     |
|    X    |        6        |     E     |     67     |
|    Y    |        1        |     P     |     78     |
|---------|-----------------|-----------|------------|

Desired Output

|---------|-----------------|-----------|------------|------------|
|  prod   |   sortcolumn    |    type   |    value   |    rowNr   |
|---------|-----------------|-----------|------------|------------|
|    X    |        1        |     P     |     12     |      1     |
|    X    |        2        |     P     |     23     |      2     |
|    X    |        3        |     E     |     34     |      1     |
|    X    |        4        |     P     |     45     |      1     |
|    X    |        5        |     E     |     56     |      1     |
|    X    |        6        |     E     |     67     |      2     |
|    Y    |        1        |     P     |     78     |      1     |
|---------|-----------------|-----------|------------|------------|

I am this far:

SELECT
  table.*,
  ROW_NUMBER() OVER(PARTITION BY table.prod, table.type ORDER BY table.sortColumn) rowNr
FROM table

But this does not restart the row number on the 4th row, since it is the same prod and type. How could I restart on every prod and also on every type change based on the sort criteria, even if the type changes back to something it already was previously? Is this even possible with a ROW_NUMBER function or do I have to work with LEAD and LAG and CASES (which would probably make it very slow, right?)

Thanks!

AakashM
  • 62,551
  • 17
  • 151
  • 186
Letimogo
  • 542
  • 3
  • 13
  • I don't see how you could avoid lag/lead when you indeed NEED the information of the previous row for your logic – George Menoutis Jun 28 '18 at 08:30
  • Why do you believe `LEAD` and/or `LAG` would make the query slow? – Thom A Jun 28 '18 at 08:32
  • Thats my experience up to now... But its from old versions of oracle, is it better on a new sql server? Keep in mind that the table is medium size (about 2 mio rows) – Letimogo Jun 28 '18 at 08:35
  • Oracle and SQl Server are very different products. `LAG` and `LEAD` are very efficient from my experience, and far better that doing a `JOIN` along the lines of `FROM MyTable MT LEFT JOIN MyTable MTl ON MT.ID +1 = MTl.ID` – Thom A Jun 28 '18 at 08:49

3 Answers3

6

This is a gaps and islands problem. You can use the following query:

SELECT t.*, 
       ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
       -
       ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
FROM mytable t

to get:

prod    sortcolumn  type    value   grp
----------------------------------------
X       1           P       12      0
X       2           P       23      0
X       3           E       34      2
X       4           P       45      1
X       5           E       56      3
X       6           E       67      3
Y       1           P       78      0

Now, field grp can be used for partitioning:

;WITH IslandsCTE AS (
    SELECT t.*, 
           ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn)
           -
           ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) AS grp
    FROM mytable t  
)
SELECT prod, sortcolumn, type, value,
       ROW_NUMBER() OVER (PARTITION BY prod, type, grp ORDER BY sortcolumn) AS rowNr
FROM IslandsCTE
ORDER BY prod, sortcolumn

Demo here

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
4

This is a classic 'islands' problem, in that you need to find the 'islands' of records related by prod and type, but without grouping together all records matching on prod and type.

Here's one way this is typically solved. Set up:

DECLARE @t TABLE (
    prod varchar(1),
    sortcolumn int,
    type varchar(1),
    value int
);

INSERT @t VALUES
('X', 1, 'P', 12),
('X', 2, 'P', 23),
('X', 3, 'E', 34),
('X', 4, 'P', 45),
('X', 5, 'E', 56),
('X', 6, 'E', 67),
('Y', 1, 'P', 78)
;

Get some row numbers in place:

;WITH numbered AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY prod, type ORDER BY sortcolumn) as rnX,
        ROW_NUMBER() OVER (PARTITION BY prod ORDER BY sortcolumn) as rn
    FROM
        @t
)

numbered now looks like this:

prod sortcolumn  type value       rnX                  rn
---- ----------- ---- ----------- -------------------- --------------------
X    1           P    12          1                    1
X    2           P    23          2                    2
X    3           E    34          1                    3
X    4           P    45          3                    4
X    5           E    56          2                    5
X    6           E    67          3                    6
Y    1           P    78          1                    1

Why is this useful? Well, look at the difference between rnX and rn:

prod sortcolumn  type value       rnX                  rn                   rn - rnX
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X    1           P    12          1                    1                    0
X    2           P    23          2                    2                    0
X    3           E    34          1                    3                    2
X    4           P    45          3                    4                    1
X    5           E    56          2                    5                    3
X    6           E    67          3                    6                    3
Y    1           P    78          1                    1                    0

As you can see, each 'group' shares a rn - rnX value, and this changes from one group to the next.

So now if we partition by prod, type, and group number, then number within that:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY prod, type, rn - rnX ORDER BY sortcolumn) rowNr
FROM
    numbered
ORDER BY 
    prod, sortcolumn

we're done:

prod sortcolumn  type value       rnX                  rn                   rowNr
---- ----------- ---- ----------- -------------------- -------------------- --------------------
X    1           P    12          1                    1                    1
X    2           P    23          2                    2                    2
X    3           E    34          1                    3                    1
X    4           P    45          3                    4                    1
X    5           E    56          2                    5                    1
X    6           E    67          3                    6                    2
Y    1           P    78          1                    1                    1

Related reading: Things SQL needs: SERIES()

AakashM
  • 62,551
  • 17
  • 151
  • 186
-2

Try this

select prod, sortcolumn, type, value, row_number() over (partition by prod, sortcolumn, type order by value) rowNr    
from table_name
Gaj
  • 888
  • 5
  • 5