How to identify non-identical values being aggregated with COUNT()?

Question

I have a table containing three columns: city ID, language code and city:

city_id | language_code | city
----------------------------------
1       | en            | London
1       | es            | Londres
1       | pt            | Londres
2       | de            | Köln
2       | en            | Cologne
3       | it            | Cologne

In some foreign languages cities can be spelled the same way, e.g. the Spanish and Portuguese name for London is Londres.

But there are cases where the same name can refer to completely different locations, e.g. Cologne is an English name for the German city of Köln but there's also a town of the same name in Italy.

I would like to be able to detect cities that have more than one entry in the table but only those that are linked to different city_id values. In my case this would be Cologne but not Londres as both es and pt language versions point to the same city_id.

I thought this would be a fairly easy thing to do but I haven't been able to get the results in a single query. Instead, I am deduping the results first and then aggregating them:

WITH deduped_cities AS (
  SELECT DISTINCT city, city_id
  FROM cities
  ORDER BY city
)
SELECT city, COUNT(city_id) AS total
FROM deduped_cities
GROUP BY city
HAVING COUNT(city_id) > 1;

This gives me the expected result:

city    | total
----------------
Cologne | 2

I was just wondering if it is possible to achieve the same effect with a single SELECT statement.

DB Fiddle

Currently the "Suggested edit queue is full". Would have suggested to correct the "Koln" vs "Köln" typo. Great question, by the way — Stefan Wuebbe, Apr 20 '22 at 23:49

score 2 · Accepted Answer · answered Apr 20 '22 at 21:05

2

you have basically just one SELECT, but you can use DISTINCT' with the COUNT

SELECT city, COUNT(city_id) AS total
FROM cities
GROUP BY city
HAVING COUNT( DISTINCT city_id) > 1;

See fiddle

answered Apr 20 '22 at 21:05

nbk

45,398
8
30
47

Thank your very much for your prompt response and an updated Fiddle @nbk - very helpful and it's interesting to see you are using `DISTINCT` only in the `HAVING` clause and not in `SELECT` (like in the @Hambone's answer). – Chris Kobrzak Apr 21 '22 at 09:36
1

the select distinct is not necessary, as you filter already everything ouf of the rows, that don't fit, you see in the response that it gives all the rows, even when you have 3 or 4 of them and not only 2, so a Distinct will not change anything – nbk Apr 21 '22 at 09:49

score 2 · Answer 2 · answered Apr 20 '22 at 21:05

2

I believe you can just do the distinct within the aggregate and the having clause:

select city, count (distinct city_id)
from cities
group by city
having count (distinct city_id) > 1

answered Apr 20 '22 at 21:05

Hambone

15,600
8
46
69

Great answer, thank you, @Hambone! – Chris Kobrzak Apr 21 '22 at 09:37

How to identify non-identical values being aggregated with COUNT()?

2 Answers2