2

I am trying to get the percentage of rows that a set of particular value has. Best explained by example. I can do this by each column very simply using ratio-to-report function and over(), but am having issues with multiple groupings

Assume table has 2 columns:

column a         column b
1000             some data
1100             some data
2000             some data
1400             some data
1500             some data

With the following query, I can get for this domain set, each one is 20% of the total rows

select columna, count(*), trunc(ratio_to_report(count(columna)) over() * 100, 2) as perc
from table
group by columna
order by perc desc;

However, what I need is for example to determine the percentage & count of the rows that contain 1000, 1400 or 2000; From looking at it, you can tell its 60%, but need a query to return that. This needs to be efficient, as the query will be running against millions of rows. Like I said before, I have this working on a single value and its percentage, but the multiple is what is throwing me.

Seems like I need to be able to put an IN clause somewhere, but the values will not be these specific values each time. I will need to get the values for the "IN" part of it from another table, if that makes sense. guess I need some kind of multiple grouping.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Marc
  • 51
  • 1
  • 3

2 Answers2

1

Potentially, you're looking for something like

SQL> ed
Wrote file afiedt.buf

  1  with x as (
  2    select 1000 a from dual
  3    union all
  4    select 1100 from dual
  5    union all
  6    select 1400 from dual
  7    union all
  8    select 1500 from dual
  9    union all
 10    select 2000 from dual
 11  )
 12  select (case when a in (1000,1400,1500)
 13               then 1
 14               else 0
 15           end) bucket,
 16         count(*),
 17         ratio_to_report(count(*)) over ()
 18    from x
 19   group by (case when a in (1000,1400,1500)
 20               then 1
 21               else 0
 22*          end)
SQL> /

    BUCKET   COUNT(*) RATIO_TO_REPORT(COUNT(*))OVER()
---------- ---------- -------------------------------
         1          3                              .6
         0          2                              .4
Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • this is close to what I am looking for except I cant use the case statements since in reality I have many combinations. I didnt explain it initially very well. I need to add a 2nd table that controls what you basically have in your case statement. – Marc Mar 12 '11 at 04:17
  • this is close to what I am looking for except I cant use the case statements since in reality I have many combinations. I didnt explain it initially very well. I need to use a 2nd table to control basically where you have your case statement Table2 Column1 Column2 1000 - 1100 2000 1400 1000 2000 - 1500 1000 Results should be: 1000 3 60% 2000 2 40% So, "bucket" would be column1 where column2 is null and count/percentage would be made up of rows where column1 is null or column2 points to column1 as a parent hope that makes more sense – Marc Mar 12 '11 at 04:26
  • how do you get the formatting correct ? I really goofed that up. In my previous example the numbers are really 2 columns with the - being a null in column 2 for that row, 1000 null, 1100 2000, 1400 1000, 2000 null, 1500 1000 as rows in a 2 column table. Sorry, but I am not familiar with how to format correctly. – Marc Mar 12 '11 at 04:29
0

I'm not sure I entirely understand the requirement, but do you need ratio_to_report at all? Have a look at the following, and let me know how close this is to what you want, and we can work from there!

T1 is the table containing your sample data

create table t1(a primary key) as
   select 1000 as a from dual union all
   select 1100 as a from dual union all
   select 1400 as a from dual union all
   select 1500 as a from dual union all
   select 2000 as a from dual;

T2 is the lookup table you mentioned (where you get the list of IDs)

create table t2(a primary key) as
   select 1000 as a from dual union all
   select 1400 as a from dual union all
   select 2000 as a from dual;

A left join from T1->T2 will return all rows in T1 paired with all matching rows in T2. For each A in T1 that does not exist in your set (T2), the result will be padded with NULL. We can exploit the fact that COUNT() doesn't count (hehe) nulls.

select count(t1.a) as num_rows
      ,count(t2.a) as in_set
      ,count(t2.a) / count(t1.a) as shr_in_set
  from t1
  left 
  join t2 on(t1.a = t2.a);

The result of running the query is:

  NUM_ROWS     IN_SET SHR_IN_SET
---------- ---------- ----------
         5          3         ,6
Ronnis
  • 12,593
  • 2
  • 32
  • 52