How to SELECT lines where COUNT DISTINCT is MAX?

Question

this may be an easy question but basically I would like to get to SELECT only the lines of highest number of occurence for an experiment in SQL SERVER. I have a query that produces the following data:

SELECT [JOB ROLE], [CITY], COUNT(DISTINCT([EMPLOYEE_ID])) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]

Result of existing query

I would like to reach to the following outcome to SELECT only the lines where the COUNT DISTINCT of EMPLOYEE_ID is MAX:

Desired outcome

Many thanks in advance for the kind help and suggestions on how to get here the easiest!

`DISTINCT` is not a function, it's a _set quantifier_. Skip those extra brackets and simply write `COUNT(DISTINCT [EMPLOYEE_ID])` to make code clearer. — jarlh, Nov 23 '20 at 15:21
What should happen is multiple groups share the maximum? Is there a rule for tie breaking, or are all groups returned, or, ... — Damien_The_Unbeliever, Nov 23 '20 at 15:23

Gordon Linoff · Answer 1 · 2020-11-23T16:01:57.140

You can use TOP WITH TIES and RANK():

SELECT TOP (1) WITH TIES [JOB ROLE], [CITY], COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]
ORDER BY RANK() OVER (ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC;

Or, use a subquery and RANK():

SELECT [JOB ROLE], [CITY], [COUNT]
FROM (SELECT [JOB ROLE], [CITY],
             COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT],
             RANK() OVER (ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC) as seqnum
      FROM MyTable
      GROUP BY [JOB ROLE], [CITY]
     ) jc
WHERE seqnum = 1;

EDIT:

The question seems to have changed to be the most common job role per city. The above queries are easily modified:

SELECT TOP (1) WITH TIES [JOB ROLE], [CITY], COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]
ORDER BY RANK() OVER (PARTITION BY [JOB ROLE] ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC;

Or, use a subquery and RANK():

SELECT [JOB ROLE], [CITY], [COUNT]
FROM (SELECT [JOB ROLE], [CITY],
             COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT],
             RANK() OVER (PARTITION BY [JOB ROLE], ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC) as seqnum
      FROM MyTable
      GROUP BY [JOB ROLE], [CITY]
     ) jc
WHERE seqnum = 1;

Thank you for the suggestion! This unfortunately only return to me one line, selecting the most common [JOB ROLE] and [CITY], instead of all [JOB ROLE] values with the most common [CITY]. — gonzika, Nov 23 '20 at 15:54
@gonzika Is that not what you asked for ? When looking at the desired output I think this was your question, so this answer is valid in my opinion — GuidoG, Nov 23 '20 at 15:56
@GordonLinoff the question is the same as it was when first posted: https://stackoverflow.com/posts/64971175/revisions — forpas, Nov 23 '20 at 16:05

score 1 · Accepted Answer · answered Nov 23 '20 at 15:42

If you don't care about ties, you can do it with FIRST_VALUE() window function:

SELECT DISTINCT [JOB ROLE], 
       FIRST_VALUE([CITY]) OVER (PARTITION BY [JOB ROLE] ORDER BY COUNT(DISTINCT([EMPLOYEE_ID])) DESC)
FROM MyTable
GROUP BY [JOB ROLE], [CITY]

How to SELECT lines where COUNT DISTINCT is MAX?

2 Answers2