rank() a group of items by count(*)

Question

I have some problems with Oracle analytic functions and need help. Here's a generic example:

create table test (item varchar2(10), value varchar2(10));

insert into test values ('item1','value1');
insert into test values ('item1','value1');
insert into test values ('item1','value1');
insert into test values ('item1','value1');
insert into test values ('item1','value1');
insert into test values ('item1','value2');
insert into test values ('item1','value2');
insert into test values ('item3','value2');
insert into test values ('item3','value2');
insert into test values ('item3','value2');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value1');
insert into test values ('item5','value2');
insert into test values ('item5','value2');
insert into test values ('item5','value2');

select item, value, count(*) c, 
       sum(count(*)) over () total, 
       sum(count(*)) over (partition by item) total_by_item,
       dense_rank() over (order by count(*) desc) dense_rank
  from test 
 group by item, value 
 order by 5 desc;

The result of the query is:

ITEM       VALUE       C      TOTAL  TOTAL_BY_ITEM DENSE_RANK
---------- ---------- -- ---------- -------------- ----------
item5      value1      7         20             10          1
item5      value2      3         20             10          3
item1      value2      2         20              7          4
item1      value1      5         20              7          2
item3      value2      3         20              3          3

How can I get the items ranked by TOTAL_BY_ITEM? So it would look like this:

ITEM       VALUE       C      TOTAL  TOTAL_BY_ITEM WHAT_I_NEED
---------- ---------- -- ---------- -------------- -----------
item5      value1      7         20             10           1
item5      value2      3         20             10           1
item1      value2      2         20              7           2
item1      value1      5         20              7           2
item3      value2      3         20              3           3

Is it possible to achieve this without another join or sub-query? I have a feeling that it is possible. I naturally think that it has to be something like this: dense_rank(count(*)) over (partition by item), like with analytic SUM that I use to get the 5th column, but it doesn't work.

Husqvik · Accepted Answer · 2015-12-06T21:25:00.097

2

I don't think this is what you are searching for but just for reference without a subquery you can achieve the same result using MODEL clause:

select item, value, c, total, total_by_item, what_i_need
from test 
group by item, value
model
  dimension by (row_number() over (order by null) d)
  measures (
    item, value,
    count(*) c,
    sum(count(*)) over () total,
    sum(count(*)) over (partition by item) total_by_item,
    1 what_i_need
  )
  rules (
    what_i_need[any] = dense_rank() over (order by total_by_item[cv()] desc)
  )
order by 5 desc;

I don't think you can achieve it without subquery otherwise.

edited Dec 06 '15 at 21:25

answered Dec 06 '15 at 20:09

Husqvik

5,669
1
19
29

Hello, thank you for you answer. It is not exactly what I was looking for, but pretty close. I'll try this tomorrow on the real data. Do you really think that what I need can not be achieved without a subquery? I have never used the MODEL clause. Are there any specifics about its performance? – stee1rat Dec 06 '15 at 21:26
1

MODEL is meant for complex transformations and reporting rules, for example when a dependency between row values is required. In your case subquery and two window funtions will be simpler. Since you don't have any natural single dimension I need third analytic function to generate surragate dimension d. Now i think it's better to use `dimension by (item || value d)` instead of using `ROW_NUMBER`. What is wrong with a subquery? – Husqvik Dec 06 '15 at 21:34
You know, now I'm starting to think that the subquery wouldn't be a problem. I will test both variants for speed tomorrow. I believe the MODEL and subquery speed should be pretty close? Anyway, thank you very much for the idea! – stee1rat Dec 06 '15 at 22:13
I would definitely choose the subquery, it's much cleaner code and I assume faster in this case. – Husqvik Dec 06 '15 at 22:18
I think I will. The problem here is that I had a subquery where I read the same table (test in my example) twice and the overall performance was awful in some cases. Now I don't have to read the same table second time, just the results. I somehow missed this part, hence the question. – stee1rat Dec 06 '15 at 22:28
no you won't `SELECT dense_rank() over (order by total_by_item desc) ... FROM ()` will scan the source table just once. And if the number of groups is small then the outer analytic function will be fairly cheap. – Husqvik Dec 06 '15 at 22:31

rank() a group of items by count(*)

1 Answers1