2

this may be an easy question but basically I would like to get to SELECT only the lines of highest number of occurence for an experiment in SQL SERVER. I have a query that produces the following data:

SELECT [JOB ROLE], [CITY], COUNT(DISTINCT([EMPLOYEE_ID])) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]

Result of existing query

I would like to reach to the following outcome to SELECT only the lines where the COUNT DISTINCT of EMPLOYEE_ID is MAX:

Desired outcome

Many thanks in advance for the kind help and suggestions on how to get here the easiest!

forpas
  • 160,666
  • 10
  • 38
  • 76
gonzika
  • 23
  • 3
  • 2
    `DISTINCT` is not a function, it's a _set quantifier_. Skip those extra brackets and simply write `COUNT(DISTINCT [EMPLOYEE_ID])` to make code clearer. – jarlh Nov 23 '20 at 15:21
  • 2
    What should happen is multiple groups share the maximum? Is there a rule for tie breaking, or are all groups returned, or, ... – Damien_The_Unbeliever Nov 23 '20 at 15:23

2 Answers2

1

You can use TOP WITH TIES and RANK():

SELECT TOP (1) WITH TIES [JOB ROLE], [CITY], COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]
ORDER BY RANK() OVER (ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC;

Or, use a subquery and RANK():

SELECT [JOB ROLE], [CITY], [COUNT]
FROM (SELECT [JOB ROLE], [CITY],
             COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT],
             RANK() OVER (ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC) as seqnum
      FROM MyTable
      GROUP BY [JOB ROLE], [CITY]
     ) jc
WHERE seqnum = 1;

EDIT:

The question seems to have changed to be the most common job role per city. The above queries are easily modified:

SELECT TOP (1) WITH TIES [JOB ROLE], [CITY], COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT]
FROM MyTable
GROUP BY [JOB ROLE], [CITY]
ORDER BY RANK() OVER (PARTITION BY [JOB ROLE] ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC;

Or, use a subquery and RANK():

SELECT [JOB ROLE], [CITY], [COUNT]
FROM (SELECT [JOB ROLE], [CITY],
             COUNT(DISTINCT [EMPLOYEE_ID]) as [COUNT],
             RANK() OVER (PARTITION BY [JOB ROLE], ORDER BY COUNT(DISTINCT [EMPLOYEE_ID]) DESC) as seqnum
      FROM MyTable
      GROUP BY [JOB ROLE], [CITY]
     ) jc
WHERE seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for the suggestion! This unfortunately only return to me one line, selecting the most common [JOB ROLE] and [CITY], instead of all [JOB ROLE] values with the most common [CITY]. – gonzika Nov 23 '20 at 15:54
  • 1
    @gonzika Is that not what you asked for ? When looking at the desired output I think this was your question, so this answer is valid in my opinion – GuidoG Nov 23 '20 at 15:56
  • @GordonLinoff the question is the same as it was when first posted: https://stackoverflow.com/posts/64971175/revisions – forpas Nov 23 '20 at 16:05
1

If you don't care about ties, you can do it with FIRST_VALUE() window function:

SELECT DISTINCT [JOB ROLE], 
       FIRST_VALUE([CITY]) OVER (PARTITION BY [JOB ROLE] ORDER BY COUNT(DISTINCT([EMPLOYEE_ID])) DESC)
FROM MyTable
GROUP BY [JOB ROLE], [CITY]
forpas
  • 160,666
  • 10
  • 38
  • 76