1

A query that is capable of dividing each value of a column by the total number of records in the table

I tried the following query

select ( (p.rank/count(*)) * 100 ) as rankratio from RankTable p;

I see an error and not able to execute the query. for example
total records is 5 so (1/5)*100 = 20

RankTable         
rank            rankratio        
1               20
2               40
3               60
4               80
5               100
leftjoin
  • 36,950
  • 8
  • 57
  • 116
sridhar
  • 329
  • 4
  • 16

1 Answers1

1

use analytic count(*) over():

select ( (s.rank/s.total_count) * 100 ) as rankratio 
from
(
select rank, count(*) over() as total_count
  from RankTable p
)s
order by s.rank;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • This works. Thanks! I tried a different approach by replacing count(*) by sub query but hive doesnt support all types of correlated subquery and I found out the way around that problem. – sridhar May 01 '19 at 18:37
  • @sridhar You are welcome! It's possible to do a join with subquery. Analytic function works faster than sub-query because the table is being read only once and no join required. – leftjoin May 01 '19 at 18:57
  • that is true, joins are really expensive dealing with tera bytes of data. – sridhar May 01 '19 at 20:02
  • @sridhar To be precise in this very case when total count is needed (no other columns in group by), no join keys is used(cross) and subquery dataset is small, only one row. Cross join will be transformed to a map-join and executed very fast. This solution still have extra table scan + small table distribution for map-join. And simple count(*) can be done using statistics only, no scan! But if you have some group by columns and the small dataset is big, the solution w analytic function and partition by columns performs definitely much better than join. – leftjoin May 01 '19 at 20:18