-1

I am querying the very popular AdventureWorks DB in SSMS. My objective to find the number of males and females under each job title from HumanResources.Employee.

For this my original query was,

SELECT JobTitle,
COUNT(CASE WHEN Gender='M' THEN 1
ELSE 0
END) AS MALE_COUNT,
COUNT(CASE WHEN Gender='F' THEN 1
ELSE 0
END) AS FEMALE_COUNT,Gender
FROM HumanResources.Employee
GROUP BY JobTitle,Gender
ORDER BY JobTitle
GO

However, I am getting incorrect answer with the above query.So by modifying it as below ,I am getting the desired result:

SELECT JobTitle,
COUNT(CASE WHEN Gender='M' THEN 1
END) AS MALE_COUNT,
COUNT(CASE WHEN Gender='F' THEN 1
END) AS FEMALE_COUNT,Gender
FROM HumanResources.Employee
GROUP BY JobTitle,Gender
ORDER BY JobTitle
GO

As can be easily seen, I am just removing the 'ELSE 0' condition for both the CASE statements, but am I unable to figure out as to how '0' is affecting the values returned in the result.

Can someone explain to me the difference between these two? Also I would like to know how the COUNT function is taking multiple values, when normally(say SELECT COUNT(3,3)) it doesn't work.

GMB
  • 216,147
  • 25
  • 84
  • 135
halfwind22
  • 329
  • 4
  • 18

2 Answers2

3

You want SUM(), not COUNT(): the latter takes in account every non-null value (this includes 0), so your current conditional expressions counts all rows - it is equivalent to COUNT(*)

Also, I suspect that gender should probably not appear in the SELECT and GROUP BY clauses, since that's precisely what you are trying to aggregate

I think that you want:

SELECT 
    JobTitle,
    SUM(CASE WHEN Gender='M' THEN 1 ELSE 0 END) AS MALE_COUNT,
    SUM(CASE WHEN Gender='F' THEN 1 ELSE 0 END) AS FEMALE_COUNT
FROM HumanResources.Employee
GROUP BY JobTitle
ORDER BY JobTitle
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thanks, SUM makes more sense than COUNT in this scenario, to be honest.However,I would like to know how the count is working in original query. Is it that it counts(and adds up) for each iteration of gender(for a group) irrespective of whether the value is 'M' or 'F'? – halfwind22 Jun 13 '20 at 16:32
  • @aravindnk: it counts the number of non-null values in the group. 0 and 1 are not null, so both genders are counted... – GMB Jun 13 '20 at 17:03
1

Documentations say:

COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.

Read more here: https://learn.microsoft.com/en-us/sql/t-sql/functions/count-transact-sql?view=sql-server-ver15

So we can rewrite your query as:

SELECT 
    JobTitle,
    COUNT(CASE WHEN Gender='M' THEN 1 ELSE NULL END) AS MALE_COUNT,
    COUNT(CASE WHEN Gender='F' THEN 1 ELSE NULL END) AS FEMALE_COUNT
FROM HumanResources.Employee
GROUP BY JobTitle
ORDER BY JobTitle

About your queries:

First query:

inside Count function you have a CASE expression which will change the 'M' to 1 and 'F' to 0. Then COUNT function will do the count operations over them. because neither 1 nor 0 are NULL so Count will return total number of records, regardless of 'M' or 'F'

This procedure is same for second CASE too.

Second query:

Inside Count function you have a CASE expression which will change 'M' to 1, but you did not mentioned what to do with other values, so NULL will be returned for non-M values. After that Count function will do the count operations on these records and will return the number of M's.(Second query is equal to the query I have posted, and they both will have the same output. However because of readability I prefer my query over yours :-) )

This procedure is same for second CASE too.

Documentations for CASE expression says:

ELSE else_result_expression Is the expression returned if no comparison operation evaluates to TRUE. If this argument is omitted and no comparison operation evaluates to TRUE, CASE returns NULL. else_result_expression is any valid expression. The data types of else_result_expression and any result_expression must be the same or must be an implicit conversion.

Read more here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/case-transact-sql?view=sql-server-ver15

SELECT Count(3,3)

This is syntactically wrong and will give you an error like this, which is pretty self explanatory:

Msg 174, Level 15, State 1, Line 1 The Count function requires 1 argument(s)

The syntax for Count function based on the aforementioned documentations is like this:

-- Aggregation Function Syntax  
COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )  

-- Analytic Function Syntax  
COUNT ( [ ALL ]  { expression | * } ) OVER ( [ <partition_by_clause> ] )
Vahid Farahmandian
  • 6,081
  • 7
  • 42
  • 62