1

I have a table called: consultant_project_score which has the following fields id, project_id, final_score. I want to generate a query which displays top 100 final_score for each project_id. I am using MySQL workbench 6.3 and I am not able to use any create/insert statements. Is there any way to achieve this?

Marc Zaharescu
  • 629
  • 1
  • 13
  • 34
  • what's error you are getting?? what you have tried ?? – Ankit Agrawal Apr 26 '16 at 09:38
  • Possible duplicate of [Using LIMIT within GROUP BY to get N results per group?](http://stackoverflow.com/questions/2129693/using-limit-within-group-by-to-get-n-results-per-group) – JimmyB Apr 26 '16 at 09:51

3 Answers3

1

Not tested,but it should work.The trick is to generate a row number for each project

select id, project_id, final_score
from 
(select id, project_id, final_score,
      @rn:= if(@project_id= project_id, @rn+ 1, 1) as rn,
      @project_id:= project_id as pi
  from t,(SELECT @rn:=0) b
  order by project_id, final_score desc) x 
where x.rn<= 100;
Mihai
  • 26,325
  • 7
  • 66
  • 81
  • I tried your query, I assume that 't' stands for consultant_project_score. Anyway it generated around 9milion records while I only have around 2000 projects in my DB, so it should have been around 200.000 records – Marc Zaharescu Apr 26 '16 at 09:53
1

This will work for you. The inner query will generate rank group by project_id and then outer query will filter the top 100 results.

Note: It will not work if you have similar combination of project_id and score. Created this fiddle to get 2 ranks.

http://sqlfiddle.com/#!9/9e94d5/1

SELECT id1,
       project_id,
       final_score
FROM
  (SELECT max(t1.id) AS id1,
          t1.project_id,
          t1.final_score,
          count(*) AS rnk
   FROM consultant_project_score t1
   INNER JOIN consultant_project_score t2 ON t1.project_id=t2.project_id
   WHERE t1.final_score<=t2.final_score
   GROUP BY t1.project_id,
            t1.final_score) t
WHERE rnk<=100
Utsav
  • 7,914
  • 2
  • 17
  • 38
  • I'm trying out your query. It's been running for about 10 min now. – Marc Zaharescu Apr 26 '16 at 10:03
  • The query will be slow as we are doing self join with group by. In MYSql, there are no `rank over partition by` else the query would be faster. You can try removing `max(t1.id)` from select clause. Also is possible, index `project_id` and `final_score`. – Utsav Apr 26 '16 at 12:20
0

This one is without Group BY and uses the concept of self join . This should give you correct output

SQL

   SELECT cps2.id, cps2.project_id,  cps2.final_score FROM 
                    (  SELECT DISTINCT project_id FROM  consultant_project_score ) cps1 
                     JOIN  consultant_project_score cps2
                                            ON cps2.id  IN ( SELECT cps3.id FROM consultant_project_score cps3 
                            WHERE cps3.project_id = cps1.projectid  
                             ORDER BY cps3.final_score  DESC LIMIT 100) 
                    ORDER BY cps2.project_id ,cps2.final_score DESC     
Shirishkumar Bari
  • 2,692
  • 1
  • 28
  • 36