SQL count distinct with a condition based on a different column

Question

I have a data set containing a list of exams, the qualifications/units they're associated with, and whether the exam was passed or failed. The data looks something like this:

candidate | qualification | unit | exam | exam_status
-----------------------------------------------------
 C1       | Q1            | U1   | E1   | Passed
 C1       | Q1            | U2   | E2   | NULL
 C1       | Q1            | U2   | E3   | Passed
 C1       | Q1            | U3   | E4   | Passed
 C1       | Q1            | U3   | E5   | Passed

From this I need to be able to calculate both the total number of units that exist for each qualification, as well as how many of those units have been passed by the candidate.

In theory, each unit should have one exam (although if the candidate fails the exam the first time there may be multiple records) so I should be able to get the data I need using the following query:

select
  candidate,
  qualification,
  count(distinct unit),
  count(
    case when exam_status = 'Passed' then 1 else null end
  )
from example_table
group by candidate, qualification

However, for whatever reason, some candidates have passed the same exam multiple times, meaning that my count of units passed is sometimes more than the total number of units.

I would like to do something like:

count(distinct exam case when exam_status = 'Passed' then 1 else null end)

to only select unique exams that have been passed, but that fails.

Does anyone know how i can achieve this? Thanks in advance.

`count(distinct case when exam_status = 'Passed' then 1 end)`. Try this. — DineshDB, Nov 20 '19 at 11:01
@jean unfortunately in this case I can't put it in the where clause, because this filters out units that haven't been attempted, which I need to get the count of total units — Sam Walpole, Nov 20 '19 at 11:12

Gordon Linoff · Accepted Answer · 2019-11-20T11:10:31.510

You need a distinct count of exams, so I think that is:

select candidate, qualification, 
       count(distinct units) as total_units,
       count(distinct case when exam_status = 'Passed' then exam end)
from example_table
group by candidate, qualification;

If you want to sum the units of the passed exams, this becomes trickier. I would recommend window functions:

select candidate, qualification, count(distinct unit),
       sum(case when exam_status = 'Passed' and seqnum = 1 then unit end) as total_units,
       count(distinct case when exam_status = 'Passed' then exam end)
from (select et.*,
             row_number() over (partition by candidate, qualification, exam 
                                order by (case when exam_status = 'Passed' then 1 else 2 end)
                               ) as seqnum
      from example_table et
     ) et
where seqnum = 1
group by candidate, qualification;

The first part of your answer is all I needed, thank you! – Sam Walpole Nov 20 '19 at 11:14 — Sam Walpole, Nov 20 '19 at 11:14

SQL count distinct with a condition based on a different column

1 Answers1