10

I've seen other posts about using the median() window function in Redshift, but how would you use it with a query that has a group by at the end?

For example, assume table course:

Course | Subject | Num_Students
-------------------------------
   1   |  Math   |      4
   2   |  Math   |      6
   3   |  Math   |      10
   4   | Science |      2
   5   | Science |      10
   6   | Science |      12

I want to get the median number of students for each course subject. How would I write a query that gives the following result:

  Subject  | Median
-----------------------
 Math      |     6
 Science   |     10

I've tried:

SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;

But it lists every occurrence of the subject and the same median number across subjects like (this is fake data so the actual value it returns is not 6, but just showing it's the same across all subjects):

  Subject  | Median
-----------------------
 Math      |     6
 Math      |     6
 Math      |     6
 Science   |     6
 Science   |     6
 Science   |     6
clavid
  • 289
  • 5
  • 13

4 Answers4

8

The following will get you exactly the result you are looking for:

SELECT distinct
subject, median(num_students) over(partition by Subject) 
FROM
course
order by Subject;
isaac.hazan
  • 3,772
  • 6
  • 46
  • 76
2

You simply need to remove the "over()" portion of it.

SELECT subject, median(num_students) FROM course GROUP BY 1;
PieCharmed
  • 126
  • 1
  • 6
  • Both this answer and @isaac.hazan's answer are correct. This answer uses the simple median() function whereas isaac.hazan's uses the median window function (which can be useful in more complex scenarios) – clavid Oct 06 '17 at 23:45
1

You haven't defined a partition in the window. Instead of OVER() you need OVER(PARTITION BY subject).

Jakub Kania
  • 15,665
  • 2
  • 37
  • 47
  • Thanks for the tip Jakub. When I do that, it says I need to group by num_students. When I do that, it DOES give me a different median for each subject, but the same subject appears multiple times instead of just once... – clavid Feb 13 '15 at 21:06
  • @clavid Either drop `GROUP BY` and use `DISTINCT` or use a subquery to calculate the median and group in the outer query. – Jakub Kania Feb 13 '15 at 21:22
1

Let's say you want to calculate other aggregations, by subject, like avg(), you need to use sub-query:

WITH subject_numstudents_medianstudents AS (
    SELECT
        subject
        , num_students
        , median(num_students) over (partition BY subject) AS median_students
    FROM
        course
)
SELECT
    subject
    , median_students
    , avg(num_students) as avg_students
FROM subject_numstudents_medianstudents
GROUP BY 1, 2
volodymyr
  • 7,256
  • 3
  • 42
  • 45