-2

I want to calculate the average grade of every record, and then get the two records with the highest average grade, per category.

My grade table looks like this:

userid | recordid | grade
123    | 1        | 8
123    | 2        | 1
123    | 3        | 3
123    | 4        | 6
121    | 1        | 3
121    | 2        | 7
121    | 3        | 1
121    | 4        | 8
124    | 1        | 6
124    | 2        | 8
124    | 3        | 9
124    | 4        | 5

And the record table like this:

id | userid | name | category | year
1  | 101    | Foo  | FooCat   | 2021
2  | 102    | Bar  | FooCat   | 2021
3  | 103    | Foos | BarCat   | 2021
4  | 104    | Bars | BarCat   | 2021

The result will look like this:

id | name | category | grade_avg
4  | Bars | BarCat   | 6.3
1  | Foo  | FooCat   | 5.7
2  | Bar  | FooCat   | 5.3
3  | Foos | BarCat   | 4.3

I'm currently using this SQL query:

SELECT S.id
     , S.name
     , S.category
     , AVG(IB.grade) AS grade_avg
  FROM 
     (SELECT id
           , name
           , category
           , year
           , row_number() over (partition by category) as r 
        FROM records) S
    JOIN grades IB 
        ON IB.recordid = S.id
 WHERE S.r < 3 
   AND S.year=2021
 GROUP 
    BY IB.recordid
 ORDER 
    BY grade_avg DESC

The grouping does work perfectly, but the grades at partitioning are not sorted by their average grade, so you just get the first 2 records per category.

What is the best query to get the result I want to achieve? This is an example, I will translate it to my own use, so that I learn to use the given SQL example better.

stijnb1234
  • 184
  • 4
  • 19
  • 1
    You want the average of the two highest grades? – Strawberry Apr 02 '21 at 13:41
  • @Strawberry Yes, I want to get the 2 records with the highest average grades, and then grouped by category. So that you get 2 records for every category. – stijnb1234 Apr 02 '21 at 13:44
  • 1
    It took kind of a long time just to figure out what you're asking. It would be helpful if your sample dataset were altered to match your actual requirements. In the example dataset add more than two records for each and then change the grades so that a given user is getting an average of different values. This way we know what you're asking and can test our answers before making them. – Altimus Prime Apr 02 '21 at 13:56
  • 1
    @AltimusPrime Will work on that. – stijnb1234 Apr 02 '21 at 13:58
  • Thank you for the updates. The record table's userid's don't match the grade table's userid's. Does it matter? – Altimus Prime Apr 02 '21 at 14:11
  • @AltimusPrime I did that on purpose. The grades are given by other users (with higher permissions) then the users who add the records. – stijnb1234 Apr 02 '21 at 14:13
  • So then it's not relevant to the aggregation? Also, I thought you only wanted the two highest grades to be averaged, but 6.3 is an average of 5,6, and 8. Is that how it really should be? – Altimus Prime Apr 02 '21 at 14:15
  • @AltimusPrime The average of every record should be calculated, then it should limit to 2 records per category. So if you have 3 categories, you will get 6 records (assuming every category has 2 records). – stijnb1234 Apr 02 '21 at 14:17
  • 1
    @stijnbannink . . . Why do you have duplicate categories in the results? – Gordon Linoff Apr 02 '21 at 14:20
  • @GordonLinoff I know, because of bad design. The category is hard-coded at the moment. – stijnb1234 Apr 02 '21 at 14:21
  • 'The category is hard-coded at the moment' I have to check out. I'm pretty good with complex queries and even procedures when needed, but how can we give an answer to match your description if the example isn't even what you're asking for? – Altimus Prime Apr 02 '21 at 14:38
  • @AltimusPrime It is? There are 5 categories (other names, but that doesn't matter for the example), I've picked two and put them in my description. – stijnb1234 Apr 02 '21 at 14:42

2 Answers2

2

If I understand correctly, you can use row_number() to enumerate the rows to get the two with the highest grade and then aggregate:

select name, category, avg(grade)
from (select g.*, r.name, r.category,
             row_number() over (partition by g.recordid order by g.grade desc) as seqnum
      from grades g join
           records r
           on g.recordid = r.id
      where r.year = 2021
     ) g
where seqnum <= 2
group by name, category;

Here is a db<>fiddle.

EDIT:

Based on your results, you want just the overall average, not the average of the top two rows. This is much simpler:

select r.name, r.category, avg(grade)
from grades g join
     records r
     on g.recordid = r.id
where r.year = 2021
group by r.name, r.category
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

I fixed this issue by taking the answer of Gordon Linoff as my base, and change some things in it.

This is working for me:

SELECT name, category, avg(grade) AS grade_avg
FROM (SELECT name, category, avg(grade) AS grade_avg,
         row_number() over (partition by category ORDER BY grade_avg DESC) as r 
      FROM grades IB JOIN
            records I 
            ON IB.recordid = I.id 
      WHERE year = 2021
      GROUP BY id
   ) S 
WHERE S.r < 3 
ORDER BY grade_avg DESC

So I'm partitioning on the category (ordered by the average grade), grouping by the ID, and then ordering by the average grade.

stijnb1234
  • 184
  • 4
  • 19