5

I have the following data set.

create table t1 (
  dept number,
  date1 date
);

Table created.

insert into t1 values (100, '01-jan-2013');
insert into t1 values (100, '02-jan-2013');
insert into t1 values (200, '03-jan-2013');
insert into t1 values (100, '04-jan-2013');
commit;

MY goal is to create a rank column that resets each time the department changes. The closest column that I can use for "partition by" clause is dept, but that won't give me the desired result.

SQL> select * from t1;

      DEPT DATE1
---------- ---------
       100 01-JAN-13
       100 02-JAN-13
       200 03-JAN-13
       100 04-JAN-13

select dept,  
       date1,
       rank () Over (partition by dept order by date1) rnk
from t1
order by date1;

      DEPT DATE1            RNK
---------- --------- ----------
       100 01-JAN-13          1
       100 02-JAN-13          2
       200 03-JAN-13          1
       100 04-JAN-13          3

The desired output is as follows. The last rnk=1 is becuase the Jan-04 record is the first record after the change.

      DEPT DATE1            RNK
---------- --------- ----------
       100 01-JAN-13          1
       100 02-JAN-13          2
       200 03-JAN-13          1
       100 04-JAN-13          1  <<<----------

Any pointers?

Rajesh Chamarthi
  • 18,568
  • 4
  • 40
  • 67

3 Answers3

4

This is a little complicated. Instead of using rank() or the like, use lag() to see when something changes. Then do a cumulative sum of the flag.

select dept, date1,
       CASE WHEN StartFlag = 0 THEN 1
            ELSE 1+StartFlag+NVL(lag(StartFlag) over (order by date1),0)
       END as rnk
from (select t1.*,
             (case when dept = lag(dept) over (order by date1)
                   then 1
                   else 0
              end) as StartFlag
      from t1
     ) t1
order by date1;

Here is the SQLFiddle.

EDIT:

This is Gordon editing my own answer. Oops. The original query was 90% of the way there. It identified the groups where the numbers should increase, but did not assign the numbers within the groups. I would do this with another level of row_number() as in:

select dept, date1,
       row_number() over (partition by dept, grp order by date1) as rnk
from (select dept, date1, startflag,
             sum(StartFlag) over (partition by dept order by date1) as grp
      from (select t1.*,
                   (case when dept = lag(dept) over (order by date1)
                         then 0
                         else 1
                    end) as StartFlag
            from t1
           ) t1
     ) t1
order by date1;

So, the overall idea is the following. First use lag() to determine where a group begins (that is, where there is a department change from one date to the next). Then, assign a "group id" to these, by doing a cumulative sum. These are the records that are to be enumerated. The final step is to enumerate them using row_number().

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • To make the lag work, I removed the partition by dept in the inner query and I am seeing 0s and 1s as expected, but the sum(Flag) does not give the end result. I created a sql fiddle here, if that helps. http://sqlfiddle.com/#!4/0a9fe/7 – Rajesh Chamarthi Jul 24 '13 at 03:20
  • If you add an order by date to the result, you'll see that the rank is not being calculated as desired. – Rajesh Chamarthi Jul 24 '13 at 03:24
  • On the right track, I think this is what is needed: http://www.sqlfiddle.com/#!4/fc339/14 – Jeffrey Kemp Jul 24 '13 at 04:37
  • Jeff - We need the final partition by, that Gordon posted as well. The Fiddle you posted will get the groups, but will fail when we have more data. Check this. http://www.sqlfiddle.com/#!4/93e5b/1. The rank for the final change is not quite right. – Rajesh Chamarthi Jul 24 '13 at 17:38
2

This could have been a case for model clause, but unfortunately it dramatically underperforms on significant amount of rows compared to Gordon's query.

select dept, date1, rank from t1
model 
  dimension by ( row_number() over(order by date1) as rn )
  measures( 1 as rank, dept, date1 ) 
  rules ( 
    rank[1] = 1,
    rank[rn > 1] = 
    case dept[cv()] 
      when dept[cv()-1] then rank[cv()-1] + 1 
      else 1
     end
  )

http://www.sqlfiddle.com/#!4/fc339/132

Kirill Leontev
  • 10,641
  • 7
  • 43
  • 49
0

The approach is:

  1. Mark each row with a 'row number' and a 'changed' flag
  2. Calculate the final 'rnk' as the difference between the 'row number', and the maximum previous 'row number' that corresponds to a 'changed' row.

This is similar to Gordon's answer, but written using a CTE which I find easier to read.

with cte as (
    select dept, date1,
        row_number() over (order by date1) as row,
        case when dept = (lag(dept) over (order by date1)) then 0 else 1 end as changed
    from t1
)
select dept, date1, 
    row - max(case when changed = 1 then row else 1 end) over (order by date1) + 1 as rnk
from cte
order by date1
John Rees
  • 1,553
  • 17
  • 24