4

The problem seems simple but I can't get my head around it, this is for sql server

 what I have in a table :            What I need as a output .
 cksum          id                      cksum       id
-2162514679     204                    -2162514679  204    1
-2162514679     207                    -2162514679  207    1
-2162514679     215                    -2162514679  215    1
-2162514679     218                    -2162514679  218    1
-2162514679     221                    -2162514679  221    1
-2160286363     257                    -2160286363  257    2
-2160286363     260                    -2160286363  260    2
-2160286363     332                    -2160286363  332    2
-2162514679     335                    -2162514679  335    3
-2162514679     338                    -2162514679  338    3
-2126731931     348                    -2126731931  348    4
-2126731931     387                    -2126731931  387    4

The table is ordered by id, I need a rank that follows the id column but that groups the cksum, notice that the cksum can return to a previous value yet still have its rank because of the ID (this is the case for the value 2162514679, it apears 5 times at the begining and a second time more bellow and they constitude two different ranks). I've been at it for a couple of hours and it seems really stupid like using a row_number with a partition or using a CTE but nope.... can't find the peice of logic to do this... anyone has the answer?

ekad
  • 14,436
  • 26
  • 44
  • 46
Flying Turtle
  • 366
  • 7
  • 20

2 Answers2

3

This is a bit tricky. You can get the grouping for the ids with a trick -- a difference of row numbers. Then you need to get the minimum id for each group, to ensure that the final rank is in the right order. Then you can use then you can use dense rank:

select cksum, id, dense_rank() over (order by minid)
from (select t.*, min(id) over (partition by cksum, grp) as minid
      from (select t.*,
                   (row_number() over (order by id) -
                    row_number() over (partition by cksum order by id)
                   ) as grp
            from table t
           ) t
     ) t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    I think, your second call of `row_number()` is missing an `order by` – DrCopyPaste Jan 21 '15 at 13:30
  • This solution is quite promissing but the problem I'm having is that this line ` row_number() over (partition by cksum Order by id)` doesn't reset the count it picks it up where the last value left off, so this somtimes gives me switches in order thus not following the id order column. If I would be able to reset it to zero for each partition then it would work. – Flying Turtle Jan 21 '15 at 13:59
  • @FlyingTurtle . . . I don't understand your comment. (And note: I just fixed the query with the right `order by`.) The switches are immaterial, because the differences -- along with cksum -- identify each group uniquely. Then `min(id)` is used for the final rank. – Gordon Linoff Jan 21 '15 at 20:49
  • When you disect the calculations for the grp column, this is what i get : `cksum 1RN 2RN grp -1 1 1 0 -1 2 2 0 -1 3 3 0 -2 4 1 3 -2 5 2 3 -1 6 4 2 -1 7 5 2 -1 8 6 2 ` as you can see from this result, the values of second RN for the first group picks up where the first one stoped (at 4), this gives erronious grp (risk of same grp for two dif cksum and erronious ranking) – Flying Turtle Jan 22 '15 at 08:01
  • @FlyingTurtle . . . The `partition by` in the middle subquery uses `cksum`. Equivalent values of `grp` are irrelevant for different `cksum` values are irrelevant. The solution does assume that `id`s are unique. – Gordon Linoff Jan 22 '15 at 12:38
  • Sorry, I didn't get your method initially, it is true that same grp or a group ranking isn't important applying your solution. I gave you the vote since you were first and your solution seems a bit more optimal. – Flying Turtle Jan 22 '15 at 13:19
1

This is a different approach, which involves simulating LAG window function not available in SQL Server 2008 R2:

;WITH CTE_RN AS (
   SELECT cksum, id, ROW_NUMBER() OVER(ORDER BY id) AS rn
   FROM Checksums
), CTE_LAG AS (
   SELECT c1.cksum, c1.id, c1.rn,
          (CASE WHEN c2.cksum IS NULL OR c1.cksum = c2.cksum THEN 0
                ELSE 1
           END) AS flag 
   FROM CTE_RN AS c1
   LEFT JOIN CTE_RN AS c2 ON c1.rn = c2.rn+1
)
SELECT cksum, id,  (SELECT SUM(flag)
                    FROM CTE_LAG AS t2        
                    WHERE t2.rn <= t1.rn) + 1 AS [rank]     
FROM CTE_LAG AS t1

CTE_LAG returns the following result set (based on the sample data of the OP):

cksum       id  rn  flag
-------------------------
-2162514679 204 1   0
-2162514679 207 2   0
-2162514679 215 3   0
-2162514679 218 4   0
-2162514679 221 5   0
-2160286363 257 6   1
-2160286363 260 7   0
-2160286363 332 8   0
-2162514679 335 9   1
-2162514679 338 10  0
-2126731931 348 11  1
-2126731931 387 12  0

Field flag is equal to 1 if current cksum is not equal to the previous cksum, otherwise flag is equal to 0.

Field rank is simply the running total of flag.

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98