2

I understand the distinction between WHERE and HAVING in a SQL query, but I don't see why they are separate clauses. Couldn't they be combined into a single clause that could handle both aggregated and non-aggregated data?

prem30488
  • 2,828
  • 2
  • 25
  • 57
  • see http://www.codeproject.com/Articles/25258/Where-Vs-Having-Difference-between-having-and-Wher. And also search in google for "WHERE vs HAVING" – prem30488 Mar 27 '14 at 17:38
  • Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause. Here's another rule: You can't use HAVING unless you also use GROUP BY. – prem30488 Mar 27 '14 at 17:41
  • This link is also explaining... http://gouravverma.blogspot.in/2008/04/where-vs-having-difference-between.html – prem30488 Mar 27 '14 at 17:42
  • I know that conditions referencing aggregation need to be written in the HAVING clause, but I don't understand why a separate clause was needed to accomplish this. What would be lost by creating, say, a FILTER ON clause that would behave like WHERE for non-aggregate conditions and like HAVING otherwise. Is there something that would be lost by combining these two clauses? – Nathan St. John Mar 27 '14 at 17:48
  • 1
    The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done. ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.So there is a need of both of them. – prem30488 Mar 27 '14 at 18:06
  • Is there ever a reason why you would put a condition in the HAVING clause that could have been put in the WHERE clause? – Nathan St. John Mar 27 '14 at 18:14
  • 3
    Keep in mind that SQL, like COBOL, was originally intended to be designed in a way that managers could use it, eliminating programmers. If it made sense then managers would have no hope of understanding it. – Hot Licks Mar 27 '14 at 18:47
  • 2
    @NathanSt.John I can think of no reason to put a condition in the `HAVING` clause that could have gone in the `WHERE` clause. The only reasons would be to prove that it can be done, or maybe to confuse people. Can't produce different results, because you can't reference a non-aggregate field in `HAVING` unless it's in the `GROUP BY`. – Hart CO Mar 27 '14 at 19:15

4 Answers4

3

Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.

Here's another rule: You can't use HAVING unless you also use GROUP BY.

The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.

ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.

Here's the difference:

SELECT `value` v FROM `table` WHERE `v`>5;

Error #1054 - Unknown column 'v' in 'where clause'

SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows

WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.

This is because WHERE clause filters data before select, but HAVING clause filters data after select.

So put the conditions in WHERE clause will be more effective if you have many many rows in a table.

Try EXPLAIN to see the key difference:

EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | table | range | value         | value | 4       | NULL |    5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+

EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
|  1 | SIMPLE      | table | index | NULL          | value | 4       | NULL |   10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+

You can see either WHERE or HAVING uses index, but the rows are different.

So there is a need of both of them especially when we need grouping and additional filters.

prem30488
  • 2,828
  • 2
  • 25
  • 57
  • I understand the different between the two clauses. It just seems like they could easily be written as a single filtering clause. In fact, the first rule you gave says how to do this: take your list of conditions and place those with aggregate functions in the HAVING clause, the rest in WHERE. Why do you need to manually separate the conditions? – Nathan St. John Mar 27 '14 at 18:24
  • but for specifying search condition for a group or an aggregate function WHERE clause will not work.. so what should be alternative? **HAVING** clause works as WHERE clause did not work – prem30488 Mar 27 '14 at 18:40
  • A HAVING clause is like a WHERE clause, but applies only to groups as a whole, whereas the WHERE clause applies to individual rows. A query can contain both a WHERE clause and a HAVING clause. The WHERE clause is applied first to the individual rows in the tables . Only the rows that meet the conditions in the WHERE clause are grouped. The HAVING clause is then applied to the rows in the result set. Only the groups that meet the HAVING conditions appear in the query output. You can apply a HAVING clause only to columns that also appear in the GROUP BY clause or in an aggregate function. – prem30488 Mar 27 '14 at 18:43
  • @NathanSt.John So instead of one simple syntax for evaluating whether a single row should appear in two different places we'd have one more complicated syntax for evaluating everything at once. Why not do away with `GROUP BY` and even `SELECT` as well? After all, a single syntax could be used to say what to retrieve, whether to retrieve, how to aggregate, and whether to display all in one. Of course, it wouldn't be SQL. –  Mar 27 '14 at 18:56
  • I guess it struck me as strange because we have no choice in the matter (what conditions go in the WHERE clauses and what conditions go into HAVING are completely determined). Since there isn't really an option, I wondered why we were even given the choice. Why open up an opportunity to make a mistake? I agree that there is a similar issue with GROUP BY. Based on the other parts of the query, what must appear there is completely determined. Why make us write it? I understand now that this is done in the interest of keeping everything explicit, not because it's necessary. – Nathan St. John Mar 27 '14 at 19:13
3

This question seems to illustrate a misunderstanding that WHERE and HAVING are both missing up to 1/2 of the information necessary to fully process a query.

Consider the following SQL:

drop table if exists foo; create table foo (
  ID int,
  bar int
); insert into foo values (1, 1);

select now() as d, bar as b
from foo
where bar = 1 and d <= now()
having bar = 1 and ID = 1
;

In the where clause, d is not available because the selected items have not been processed to create it yet.

In the having clause ID has been discarded because it was not selected. In aggregate queries ID may not even have meaning in context of multiple rows combined into one. ID may also be meaningless when joining different tables into a single result.

2

Could it be done? Sure, but on the back-end it'd do the same as it does now, because you have to aggregate something before you can filter based on that aggregation. Ultimately that's the reason, it's a logical separation of different processes. Why waste resources aggregating records you could have filtered with a WHERE?

Hart CO
  • 34,064
  • 6
  • 48
  • 63
  • If it would be the same process on the back-end, why make two clauses when one could have done the job? I agree it is a waste of resources to aggregate records you could have filtered out, so why not just always apply non-aggregate conditions first? Why do we have to actually write those conditions under a separate clause? – Nathan St. John Mar 27 '14 at 18:03
  • Agreed. `WHERE` is pre-processing that happens on each row. `HAVING` is post-processing that happens after the result has been calculated and consolidated. Two places where something is filtered, two clauses. Seems very simple to me. Combining them would seem unwieldy as they'd always need to be broken back apart for processing. Logic is generally about breaking complex problems into simple pieces, not combining simple pieces into unwieldy constructs. –  Mar 27 '14 at 18:43
  • @NathanSt.John It's a logical differentiation that warrants distinction. I feel like that's reason enough, there are hundreds of similar questions, like: Why do all non-aggregate fields need to be listed in `GROUP BY` to get accurate results? That could easily be implied and dealt with on the back-end too. – Hart CO Mar 27 '14 at 18:57
  • My guess was that the distinction was done to make the logic explicit, but I wanted to make sure there wasn't also a deeper reason. Thanks for clarifying. – Nathan St. John Mar 27 '14 at 19:06
1

The question could only be fully answered by the designer since it asks intent. But the implication is that both clauses do the same thing only against aggregated vs. non-aggregated data. That's not true. "The HAVING clause is typically used together with the GROUP BY clause to filter the results of aggregate values. However, HAVING can be specified without GROUP BY."

As I understand it, the important thing is that "The HAVING clause specifies additional filters that are applied after the WHERE clause filters."

http://technet.microsoft.com/en-us/library/ms179270(v=sql.105).aspx

CuriousLayman
  • 207
  • 1
  • 10
  • 2
    Not just after the `WHERE`, but pretty much after everything else has been calculated. (for example individual `SELECT` values etc.) `HAVING` is a clause that operates on final result rows, whereas `WHERE` operates on what is being retrieved. (pre-filter on input vs post-filter on output) –  Mar 27 '14 at 18:47