sql rank only continuous rows

Question

I have a query in which I am ranking the rows on the basis of 3 columns. I am successful in doing so, except that if any row contains same data in those 3 columns, it gives it the next rank even if it is not continuous in the output. I want that if any row matches the data in those columns, it should be given next rank only if it is in continuous rows, and if not then it should again give it rank as 1. I tried the following code:

  SELECT DISTINCT DENSE_RANK () OVER (PARTITION BY Patient_ID, 
                                                 Opnametype, 
                                                 afdelingscode ORDER BY  Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rnk, 
                *
  FROM t_opnames
  ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd

It is giving the output as:

rnk Opnamenummer Patient_ID afdelingscode     Opnametype   Specialismen  OntslagDatumTijd ...
1   2983800      100006     RD8-GH MAU        Inpatient-E  GM            2014-09-01 14:50:00.000
2   2983800      100006     RD8-GH MAU        Inpatient-E  GM            2014-09-02 19:32:00.000
1   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-09-03 17:12:00.000  
1   2983800      100006     RD8-GH Endo       Inpatient-E  GM            2014-09-09 09:06:00.000
2   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-09-17 17:00:00.000
3   2983800      100006     RD8-GH Ward 08    Inpatient-E  GM            2014-10-01 17:15:00.000

So, all the rows are correct except last 2 rows. I want the rank of them as 1 and 2 instead of 2 and 3, because the row with "RD8-GH Endo" is between them. So how can I do that?

Could you edit your questions and add the sample data for all the columns that are included in the `OVER()` clause? — gotqn, Apr 16 '15 at 09:03
Not an answer, but this could be helpful: http://stackoverflow.com/questions/13405264/t-sql-reset-row-number-on-field-change — Evaldas Buinauskas, Apr 16 '15 at 09:33
I am using SQL server 2008. Thanks for the reply, please help. — Hemant Sisodia, Apr 16 '15 at 09:48

ughai · Answer 1 · 2015-04-16T10:31:20.070

You can a co-related sub query to achieve this. Use something like this

DECLARE @t_opnames TABLE
(
    Opnamenummer INT,
    Patient_ID INT,
    afdelingscode     VARCHAR(100),
    Opnametype   VARCHAR(100),
    Specialismen  CHAR(2),
    OntslagDatumTijd DATETIME
)

Insert into @t_opnames
SELECT  2983800      ,100006,     'RD8-GH MAU',        'Inpatient-E',  'GM',            '2014-09-01 14:50:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH MAU',        'Inpatient-E',  'GM',            '2014-09-02 19:32:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-09-03 17:12:00.000'  
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Endo',       'Inpatient-E',  'GM',            '2014-09-09 09:06:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-09-17 17:00:00.000'
UNION ALL SELECT 2983800      ,100006,     'RD8-GH Ward 08',    'Inpatient-E',  'GM',            '2014-10-01 17:15:00.000'


;WITH CTE as 
(
SELECT DENSE_RANK() OVER(ORDER BY  Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) rnk,* 
  FROM @t_opnames
)
SELECT rnk-ISNULL((
            SELECT MAX(rnk) 
            FROM CTE c2 
            WHERE c2.Opnamenummer <= c1.Opnamenummer
            AND c2.SPECIALISMEN <= c1.SPECIALISMEN
            AND c2.OntslagDatumTijd <= c1.OntslagDatumTijd
            AND c2.rnk < c1.rnk
            AND (c2.Patient_ID <> c1.Patient_ID 
                OR   c2.Opnametype <> c1.Opnametype 
                OR c2.afdelingscode <> c1.afdelingscode)),0) rnk,Patient_ID, Opnametype,afdelingscode,Opnamenummer, SPECIALISMEN, OntslagDatumTijd
FROM CTE c1
  ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd

Actually I forgot to add more columns in order by due to editing of the question. Now I have added all the column in order by, but by doing so, it is messing up all the ranks. Kindly help. — Hemant Sisodia, Apr 16 '15 at 09:49
Do you want to order the set based on `Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd` — ughai, Apr 16 '15 at 09:55
I dont know what is the problem, but if I run the exact query given by you, it runs perfectly, but in my case there are some more columns(thats why I used '*') and 75k+ Rows, it just hangs up and keep on running without any output. If i limit the rows to top 50 only, it is showing the ranks as -35,-34,....-42,9,7 etc. — Hemant Sisodia, Apr 16 '15 at 10:08
The co-related query is going to take some time if you plan to run it over 75k+ rows without any filter because of checks with `OR` and `<>` — ughai, Apr 16 '15 at 10:17
Please see this output of my query, the output is pretty messed up, so far from my desired result: https://i.imgur.com/a9yD7ii.png — Hemant Sisodia, Apr 16 '15 at 10:21
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75380/discussion-between-ughai-and-hemant-sisodia). — ughai, Apr 16 '15 at 10:31

score 1 · Accepted Answer · answered Apr 16 '15 at 12:10

Finally I got the solution of my query, now I am getting my desired output and that too in 3 seconds running over 75k+ rows. The code I used is:

SELECT DISTINCT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS rownum, 
            * INTO #temp
FROM t_opnames
ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd;

WITH CTE
AS (SELECT *, 
           ROW_NUMBER () OVER (ORDER BY rownum) - ROW_NUMBER () OVER (PARTITION BY Patient_ID, 
                                                                                   Opnametype, 
                                                                                   afdelingscode ORDER BY rownum) AS RowGroup
      FROM #temp) 
SELECT ROW_NUMBER () OVER (PARTITION BY RowGroup, 
                                        Patient_ID, 
                                        Opnametype, 
                                        afdelingscode ORDER BY rownum) AS GroupSequence, 
       *
  FROM CTE
  ORDER BY rownum;

DROP TABLE #temp;

I referred an example posted at this page

Tanner · Answer 3 · 2015-04-16T12:45:04.590

This doesn't directly answer the question, but I'm aiming to explain why what you are trying isn't working as you expect.

Your issue is caused by the PARTITION. If you remove the non-unique columns from your PARTITION clause, you are left with afdelingscode. So in simple terms your PARTITION is grouping the data like so:

RD8-GH Endo
RD8-GH MAU
RD8-GH MAU
RD8-GH Ward 08
RD8-GH Ward 08
RD8-GH Ward 08

The ORDER BY clause determines the order within your PARTITION, so again removing the non-unique columns gives you ORDER BY OntslagDatumTijd, which produces this, where it's ordered by the date column, note the partitions are still separated by afdelingscode:

afdelingscode   OntslagDatumTijd
RD8-GH Endo     2014-09-09 09:06:00.000
RD8-GH MAU      2014-09-01 14:50:00.000
RD8-GH MAU      2014-09-02 19:32:00.000
RD8-GH Ward 08  2014-09-03 17:12:00.000
RD8-GH Ward 08  2014-09-17 17:00:00.000
RD8-GH Ward 08  2014-10-01 17:15:00.000

The ranking is then applied to these partitions. The output of this becomes:

rnk afdelingscode   OntslagDatumTijd
1   RD8-GH Endo     2014-09-09 09:06:00.000
1   RD8-GH MAU      2014-09-01 14:50:00.000
2   RD8-GH MAU      2014-09-02 19:32:00.000
1   RD8-GH Ward 08  2014-09-03 17:12:00.000
2   RD8-GH Ward 08  2014-09-17 17:00:00.000
3   RD8-GH Ward 08  2014-10-01 17:15:00.000

So it's being ranked according to how you have specified, the problem in your output is because at the end of your select (taking out the non-unique columns) is ordering by the date column OntslagDatumTijd, which gives you:

rnk afdelingscode   OntslagDatumTijd
1   RD8-GH MAU      2014-09-01 14:50:00.000
2   RD8-GH MAU      2014-09-02 19:32:00.000
1   RD8-GH Ward 08  2014-09-03 17:12:00.000
1   RD8-GH Endo     2014-09-09 09:06:00.000
2   RD8-GH Ward 08  2014-09-17 17:00:00.000
3   RD8-GH Ward 08  2014-10-01 17:15:00.000

I will continue to look at this if the other answer that is posted doesn't meet your requirements.

Reference:

OVER Clause

PARTITION BY Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

ORDER BY clause Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window functioncalculation is performed.

Here's a potential solution that may have performance issues with the size of the data you are using, but you can test it:

-- sets up your dummy data
CREATE TABLE #t_opnames
    (
      Opnamenummer INT ,
      Patient_ID INT ,
      afdelingscode NVARCHAR(20) ,
      Opnametype NVARCHAR(20) ,
      Specialismen NVARCHAR(20) ,
      OntslagDatumTijd DATETIME
    );

INSERT  INTO #t_opnames
        ( Opnamenummer, Patient_ID, afdelingscode, Opnametype, Specialismen,
          OntslagDatumTijd )
VALUES  ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
          '2014-09-01 14:50:00.000' ),
        ( 2983800, 100006, 'RD8-GH MAU', 'Inpatient-E', 'GM',
          '2014-09-02 19:32:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-09-03 17:12:00.000' ),
        ( 2983800, 100006, 'RD8-GH Endo', 'Inpatient-E', 'GM',
          '2014-09-09 09:06:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-09-17 17:00:00.000' ),
        ( 2983800, 100006, 'RD8-GH Ward 08', 'Inpatient-E', 'GM',
          '2014-10-01 17:15:00.000' )

-- I've added a row number to your data to enable iteration over the data
SELECT  ROW_NUMBER() OVER ( ORDER BY OntslagDatumTijd ) AS rn ,
        *
INTO #temp
FROM    #t_opnames
ORDER BY OntslagDatumTijd
-- this will iterate over the rows and apply the rankings
;WITH cte AS (
    SELECT *, 1 AS rnk 
    FROM #temp 
    WHERE rn = 1

    UNION ALL 

    SELECT t.*, CASE WHEN cte.afdelingscode = t.afdelingscode 
                     THEN cte.rnk + 1 
                     ELSE 1 
                END AS rnk 
    FROM #temp t
    INNER JOIN cte ON cte.rn +1 = t.rn
)
SELECT * FROM cte

DROP TABLE #t_opnames
DROP TABLE #temp

You will hit the MAXRECURSION limit for a larger dataset, for which you need to modify the limit using the following after the final SELECT:

SELECT * FROM cte
OPTION (MAXRECURSION 0)

Setting this value to 0 will not impose any limit, you can set this number to the size of your dataset if you know it up front.

Thanks for the reply. Actually data in none of those 3 columns is unique. In these rows, Patient_Id is same, but in following rows, this would change and so would Opnametype. And the previous answer posted is giving the same output I desired, but is so slow that I need to cancel the query after 4-5 minutes. — Hemant Sisodia, Apr 16 '15 at 10:51
@HemantSisodia I'm aware that your post only contains sample data, I'm just trying to explain why your sample doesn't behave as you expected. — Tanner, Apr 16 '15 at 10:56
@HemantSisodia does the other answer not provide you with a solution? — Tanner, Apr 16 '15 at 11:01
yes it is solving my case but only if I limit the rows to top 500 or so, running this query over 75k+ records is not doing anything and I need to cancel running the query after 4-5 minutes. Hence it is so slow that it is not usable in my case, as I cannot use where clause, because I need to work on all rows present in the table. — Hemant Sisodia, Apr 16 '15 at 11:04
@HemantSisodia have posted a possible solution for you to test. — Tanner, Apr 16 '15 at 11:21
I tried your code by adding all the columns I wanted in Order By Clause and when I ran it over my entire table with 75k+ records, I got this error `The statement terminated. The maximum recursion 100 has been exhausted before statement completion.` But the output was showing first 101 rows which was showing correct result. What should I do in this case? — Hemant Sisodia, Apr 16 '15 at 11:34
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75389/discussion-between-hemant-sisodia-and-tanner). — Hemant Sisodia, Apr 16 '15 at 11:36
@HemantSisodia have updated my answer, setting the `maxrecursion` will get around that issue. — Tanner, Apr 16 '15 at 12:46
I tried adding `OPTION (MAXRECURSION 0)`, and it is working too, but it is also too slow, my records are 75k+ and after 50 seconds running this query, it could show only 5000 records and I had to stop running the query after that. But thanks anyways, I have found the solution of my problem and have already posted that. — Hemant Sisodia, Apr 16 '15 at 13:02

sql rank only continuous rows

3 Answers3