Understanding how WHERE works with GROUP BY and Aggregation

Question

My query -

select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv --1
from Customer as cu inner join SalesInvoice as si --2
on cu.CustomerID = si.CustomerID -- 3
-- put the WHERE clause here ! --4   
group by cu.CustomerID,cu.FirstName,cu.LastName -- 5
where cu.FirstName = 'mark' -- 6

Output with correct code -

enter image description here

Error i get - Incorrect syntax near the keyword 'where'.

Can you tell me why I get this error ? I want to know why WHERE comes before GROUP BY and not after.

Thanks for all the good answers. It was difficult to choose one of them. But, i had to choose one. — david blaine, Dec 22 '12 at 21:11

Taryn · Accepted Answer · 2012-12-22T20:36:00.517

You have the order wrong. The WHERE clause goes before the GROUP BY:

select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv 
from Customer as cu 
inner join SalesInvoice as si 
   on cu.CustomerID = si.CustomerID 
where cu.FirstName = 'mark' 
group by cu.CustomerID,cu.FirstName,cu.LastName

If you want to perform a filter after the GROUP BY, then you will use a HAVING clause:

select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv 
from Customer as cu 
inner join SalesInvoice as si 
   on cu.CustomerID = si.CustomerID 
group by cu.CustomerID,cu.FirstName,cu.LastName
having cu.FirstName = 'mark'

A HAVING clause is typically used for aggregate function filtering, so it makes sense that this would be applied after the GROUP BY

To learn about the order of operations here is article explaining the order. From the article the order of operation in SQL is:

To start out, I thought it would be good to look up the order in which SQL directives get executed as this will change the way I can optimize:

FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause

Using this order you will apply the filter in the WHERE prior to a GROUP BY. The WHERE is used to limit the number of records.

Think of it this way, if you were applying the WHERE after then you would return more records then you would want to group on. Applying it first, reduces the recordset then applies the grouping.

Yes, i knew that. I had mentioned that in the question itself. I want to know why WHERE comes before GROUP BY and not after. — david blaine, Dec 22 '12 at 20:25
@davidblaine see my edit, including an article on the order of sql operations -- http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm — Taryn, Dec 22 '12 at 20:29

score 6 · Answer 2 · answered Dec 22 '12 at 20:27

6

The where clause comes before the group by because conceptually you filter before you group, not after. You want to restrict the output of the that is grouped to only those that match rather than perform the grouping on items that you will, potentially, throw away due to the filter.

answered Dec 22 '12 at 20:27

tvanfosson

524,688
99
697
795

Okay, i think i get it. Consider a situation - Your company employs 1000 programmers. A programmer can be junior, middle, senior, leader. You are interested in seeing only the leaders and seniors. The programmer table has so many rows. But, the rows (leaders) that are relevant to you might turn out to be just 200 (leaders and seniors) rows. You want to group by experience. Why go through all the 1000 programmers. Fetch the seniors and leaders. Then group then to get two rows. Makes sense ? – david blaine Dec 22 '12 at 20:56

score 4 · Answer 3 · answered Dec 22 '12 at 20:27

4

SQL does allow you to filter on the results of a GROUP BY -- it's called the HAVING clause.

If you want to filter on something that could be determined prior to the grouping (i.e. everyone with FirstName = 'Mark'), that's done via WHERE.

However, if you want to filter on everyone with 4 or more invoices (i.e., something you wouldn't know until after doing the COUNT), then you use HAVING.

answered Dec 22 '12 at 20:27

ExactaBox

3,235
16
27

Not directly related to my question, but nice tip. I swapped the where with a having and it works. Thanks. – david blaine Dec 22 '12 at 20:31
I wonder why HAVING can do what i want but WHERE cant do it. Strange – david blaine Dec 22 '12 at 20:32
What is it that you want? To filter only on people named Mark? WHERE absolutely can do it. – ExactaBox Dec 22 '12 at 20:33

score 3 · Answer 4 · answered Dec 22 '12 at 20:29

The WHERE clause is used before GROUP BY, because it makes more sense. The filter specified in the WHERE clause is used before grouping. After grouping, you can have a HAVING clause, which is similar to WHERE, except you can filter by aggregate values as well.

Compare:

-- Select the number of invoices per customer (for Customer 1 only)
SELECT
  si.CustomerID,
  COUNT(*) as InvoiceCount
FROM
  SalesInvoice as si   
WHERE
  si.CustomerID = 1
  -- You cannot filter by count(*) here, because grouping hasn't taken place yet.
GROUP BY 
  si.CustomerID -- (Not needed in this case, because of only 1 customer)

against

-- Select all invoices of customers that have more than three invoices
SELECT
  si.CustomerID,
  COUNT(*) as InvoiceCount
FROM
  SalesInvoice as si   
GROUP BY
  si.CustomerId
HAVING
  -- You can filter by aggregates, like count, here.
  COUNT(*) > 3

score 1 · Answer 5 · answered Dec 22 '12 at 20:55

1

Let's say you have 100,000 people in your database. 9 of whom are named Mark. Why should the database do a Count operation on all 100,000, then throw out the 99,991 NOT named Mark? Doesn't it seem smarter to filter out the Marks first, then do the Count only 9 times? Makes the operation a whole lot faster.

answered Dec 22 '12 at 20:55

ExactaBox

3,235
16
27

thanks !!! got it, i just typed something similar above (see comment for tvanfosson answer). please tell me if that is ok too. – david blaine Dec 22 '12 at 20:58
SQL isn't necessarily executed top down and the optimizers in modern day DBMS's do much smarter things than this. Performance is not an issue. I think the reason is much more for human readability and consistency. – GolezTrol Dec 22 '12 at 21:03
perhaps in 2012 the optimizer engine is smart enough... I'm sure it wasn't that way when SQL was originally developed. – ExactaBox Dec 22 '12 at 21:05

Understanding how WHERE works with GROUP BY and Aggregation

5 Answers5

Linked