SQL In and having-clause, why do they produce different results?

Question

Considering the following relation movie {country,major_genre,production_year, run_time, title, I'd like to list all countries except Spain, the movies produced in that country, provided there are at least two of them.

I wrote two queries and they somehow produced different results. It seems that the first one is correct, but it seems to me that they are equal. I am learning SQL. Can someone please help to explain the differences? Thanks for your help!

The first one:

SELECT m1.country, m1.production_year, m1.title
FROM movie m1
WHERE m1.country <> "Spain"
AND m1.country
IN (
    SELECT m2.country
    FROM movie m2
    GROUP BY m2.country        //select only the ones with at least 2 movies
    HAVING COUNT( * ) >=2
 )
ORDER BY m1.country ASC , m1.production_year DESC

And the second one:

SELECT m1.country, m1.production_year, m1.title
FROM movie m1
WHERE m1.country <> "Spain"
GROUP BY m1.country
HAVING COUNT( * ) >=2  //the country selected should have count of at least 2 rows
ORDER BY m1.country ASC , m1.production_year DESC

Looks like a MySQL query which is relaxed about the rules related to the GROUP BY clause. FYI, the second query is not correct, as you must list all the columns in the GROUP BY clause that are in the SELECT list — cha, Apr 20 '15 at 04:47
I am seeing "m1.country, m1.production_year, m1.title" in the select statement but whereas in the group by you have only m1.country — backtrack, Apr 20 '15 at 04:47
But after adding "m1.country, m1.production_year, m1.title" to group by, it is still not correct — user3735871, Apr 20 '15 at 05:03
The general GROUP BY rule is: If a GROUP BY clause is specified, each column reference in the SELECT list must either identify a grouping column or be the argument of a set function. — jarlh, Apr 20 '15 at 06:48

Turophile · Accepted Answer · 2015-04-20T05:14:01.373

1

The problem is that the second query uses GROUP BY on a single column, but returns 3 columns. So each country with > 1 movie comes out once, with a "random" value for year and movie (may not actually be random).

You can also do it without the GROUP BY:

SELECT m1.country, m1.production_year, m1.title
FROM movie m1
WHERE m1.country <> "Spain"
AND 1 < (
    SELECT count(*) 
    FROM movie m2
    WHERE m2.country = m1.country
  )
ORDER BY m1.country ASC , m1.production_year DESC

Here is a fiddle: http://sqlfiddle.com/#!9/e2ddc/2

edited Apr 20 '15 at 05:14

answered Apr 20 '15 at 05:05

Turophile

3,367
1
13
21

Thanks! But I am trying to understand why the second one doesnt work. Even after I add `m1.country, m1.production_year, m1.title` to the GROUP BY, it is still incorrect. – user3735871 Apr 20 '15 at 05:18
1

Because the `COUNT(*)` applies to all the columns in the `GROUP BY`. So, in your original 2nd query, you are asking _summarize by country and only keep the countries with > 1 in the group_ then add year and title from somewhere. Which is why it doesn't quite work. You have already summarised rows down to one per country so you can't go back to having multiple per country. – Turophile Apr 20 '15 at 06:37
1

If you ask to `GROUP BY` more columns, the `COUNT(*)` will no longer be what you want, because you are counting years and titles. You can't both group (by country) and include the things (year, title) you are ignoring for the group in a single step. – Turophile Apr 20 '15 at 06:37

SQL In and having-clause, why do they produce different results?

1 Answers1