Can I write a condition on a single column name of a relation in HAVING clause

Question

Considering the following relational schema

customers(id, name, age, address, salary)

I tried a query

SELECT SUM(salary), age FROM customers             
GROUP BY age HAVING age > 23 ;                    ...(1)

I was surprised to see that it worked fine and that I could write a single column condition also in HAVING clause.

Even this is also working

SELECT SUM(salary), age FROM customers
GROUP BY age, salary HAVING age > 23 AND salary >2000;   ...(2)

Otherwise, I should have written it like this : (using WHERE clause)

SELECT SUM(salary), age FROM customers
WHERE age > 23 GROUP BY age;                       ...(3)

And

SELECT SUM(salary), age FROM customers
WHERE age > 23 AND salary >2000 GROUP BY age, salary ;    ..(4)

But when I tried with more combinations I found that

that column name must be present in GROUP BY clause also on which condition is applied in HAVING clause.

Am I correct or is it possible to write a single column condition in HAVING clause in any other way also ?

Why is it working because I had earlier studied that we can write only conditions on Aggregate functions in HAVING clause.

This is a MySQL "feature" which leads to so many misunderstandings by new users. Other DB engines do not allow that. So just don't do it and move on :) — juergen d, Jan 15 '15 at 10:37
You can put non-aggregate conditions in the HAVING clause, but don't do it, because you may end up surprised. Aggregate function conditions in the HAVING clause, other conditions in either ON clause or WHERE clause. — jarlh, Jan 15 '15 at 10:42
But it allows non-aggregate conditions with HAVING clause only when they are also present with the GROUP-BY clause. Any specific reason for that behavior ? — A_J, Jan 15 '15 at 10:49

score 2 · Accepted Answer · answered Jan 15 '15 at 11:02

2

You're generally correct. Important thing is to understand grouping at all.

When using GROUP BY, server scans 'rows' and buckets them into some 'groups'. Then every 'group' works as a single new row. When operating these 'new lines' - in SELECT, HAVING or ORDER clauses - server needs to know 'attribute values' of them. These attribute values are aggregations of rows' attribute values or expressions with these aggregations.

When some attribute or expression used in the GROUP BY clause, it's aggregation values are quite deterministic, so server give us ability to simplify process. We can write something like

SELECT object_type, count(*)
FROM user_objects
GROUP BY object_type
HAVING MAX(object_type) like '%O%'
ORDER BY MIN(object_type)

It would work fine if we do this. But we can write simply

SELECT object_type, count(*)
FROM user_objects
GROUP BY object_type
HAVING object_type like '%O%'
ORDER BY object_type

which means exact the same. If column does not mentioned into the GROUP BY values - rule above became not true, so we cannot use it directly, without aggregation.

answered Jan 15 '15 at 11:02

Sanders the Softwarer

2,478
1
13
28

you mean to say this feature is there just for simplifying the process. But I think HAVING MAX(object_type) like '%O%' would be different from HAVING object_type like '%O%'. What is your opinion about that ? – A_J Jan 15 '15 at 11:23
Because MAX(object_type) will compare just one value whereas object_type will compare for each value of the column. – A_J Jan 15 '15 at 11:26
So how can both be the same ? – A_J Jan 15 '15 at 11:27
Also the MAX value will be different from the column values. Kindly reply. – A_J Jan 15 '15 at 11:39
GROUP BY "field" means than field into group has a single value, so MIN(field), MAX(field), AVG(field), COUNT(distinct field) and so on would be quite predictable. – Sanders the Softwarer Jan 15 '15 at 11:59
You mean to say all MAX, MIN, AVG will be the same since it is single value. So why not write directly the column name instead of writing those. – A_J Jan 15 '15 at 12:12

score 0 · Answer 2 · answered Jan 15 '15 at 11:38

SELECT SUM(salary), age 
FROM customers
GROUP BY age, salary 
HAVING age > 23 AND salary >2000;

This gives you one record per age and salary, as you group by these. Later you remove some of the result lines. The sum of the salary is of course the salary itself.

If these are your records for instance:

salary  age  something
1000    30    100
1000    30    200    
2000    30    300    
2000    40    400

then you group like this:

salary  age  something
1000    30    100
              200    
2000    30    300    
2000    40    400

For the group 1000/30 the sum(something) is 300 and avg(something) is 150. But the sum(salary) is 1000 and avg(salary) is 1000 and min(salary) is 1000, and so on, because it is just one salary value you are talking about.

The HAVING clause then removes lines from the result where age is over 23 and salary over 2000. You could have removed these records from evaluation by using a WHERE clause instead, thus saving the dbms some work. But you made the dbms collect all age and salary groups first, only to say which ones you dismiss afterwards.

I agree though that it would be better the DBMS raised an error telling you that sum(salary) makes no sense as it is just the one salary of the group.

Can I write a condition on a single column name of a relation in HAVING clause

2 Answers2