Assume I have a table like this
id | cnt | tier |
---|---|---|
1 | 100 | gold |
2 | 200 | silver |
3 | 300 | bronze |
4 | 400 | bronze |
5 | 500 | bronze |
6 | 600 | gold |
7 | 700 | silver |
8 | 800 | silver |
9 | 900 | silver |
10 | 1000 | silver |
and if I want percentile rank over cnt
within each tier (e.g.
For tier 'gold', I have 2 rows, cnt = 100 and cnt = 600, so they're ranked at 0.0 percentile and 1.0 percentile respectively.)
I can easily do PERCENT_RANK() OVER(PARTITION BY tier ORDER BY cnt) AS percentile1
, thus
id | cnt | tier | percentile1 |
---|---|---|---|
1 | 100 | gold | 0.0 |
2 | 200 | silver | 0.0 |
3 | 300 | bronze | 0.0 |
4 | 400 | bronze | 0.5 |
5 | 500 | bronze | 1.0 |
6 | 600 | gold | 1.0 |
7 | 700 | silver | 0.25 |
8 | 800 | silver | 0.5 |
9 | 900 | silver | 0.75 |
10 | 1000 | silver | 1.0 |
Now what if I want a percentile rank over cnt
but among all tiers' averages? To better illustrate, we have gold
's avg = 350, silver
's avg = 720, and bronze
's avg = 400, so to get cnt
's percentile rank among [350, 400, 720] for each row. i.e.
id | cnt | tier | percentile2 |
---|---|---|---|
1 | 100 | gold | 0.0 |
2 | 200 | silver | 0.0 |
3 | 300 | bronze | 0.0 |
4 | 400 | bronze | 0.333 |
5 | 500 | bronze | 0.666 |
6 | 600 | gold | 0.666 |
7 | 700 | silver | 0.666 |
8 | 800 | silver | 1.0 |
9 | 900 | silver | 1.0 |
10 | 1000 | silver | 1.0 |
How should I write the (Hive) SQL? Thanks