I have a large table (millions of records) that looks something like this. I need to get the top 30% (or 70th percentile) on daily basis. So the results should look something like the section table. The table already has the Bread_Date
sorted in ascending order and the consumption_per_eater
values are already sorted in descending order.
I understand this is similar to the top-N problem where I may need to write this as part of the select
statement:
if(@preBread_Date is null or @preBread_Date != Bread_Date, @row:=1, @row:=@row+1) as 'row', @preBread_Date := Bread_Date
But I am stuck. I don't know what to write because on a daily basis, there are also different number of records. So in the example below, 2012-07-10 has only 3 records, while 2012-07-11 has 6 different records and 2012-07 has 4 records. One of the things I would like to avoid is any join.
Would really appreciate if anyone can show me the query I need to write next. Thanks in advance.
Bread_Date | Consumption_Per_Eater
......................................................................
2012-07-10 | 300.1
2012-07-10 | 290.9
2012-07-10 | 230.5
2012-07-11 | 230.3
2012-07-11 | 340.1
2012-07-11 | 330.9
2012-07-11 | 230.5
2012-07-11 | 210.3
2012-07-11 | 209.4
2012-07-12 | 400.9
2012-07-12 | 300.5
2012-07-12 | 210.3
2012-07-12 | 200.4
Bread_Date | Top_30%_of_that_day
..............................................................................
2012-07-10 | XXXXXXX
2012-07-11 | XXXXXXX
2012-07-12 | XXXXXXX