1

I have the following table named population:

╔════════════╦════════════╦════════════════╗
║     india  ║ hyderabad  ║          50100 ║
║     india  ║ delhi      ║          75000 ║
║     USA    ║ NewYork    ║          25000 ║
║     USA    ║ california ║          30000 ║
║     india  ║  delhi     ║           5000 ║
║     USA    ║  NewYork   ║          75000 ║
╚════════════╩════════════╩════════════════╝

I need to write a SQL query to get data in the following format:

╔════════╦═════════╦══════════╗
║ india  ║ delhi   ║    80000 ║
║ USA    ║ NewYork ║   100000 ║
╚════════╩═════════╩══════════╝

country name and the city with the highest population where multiple entries of the cities are summed up.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
sandy kay
  • 115
  • 1
  • 14

3 Answers3

3

You can use:

SELECT *
FROM (
  SELECT country,city, SUM(pop) AS total
  FROM population 
  GROUP BY country,city) AS sub
WHERE (country, total) IN (
                           SELECT country, MAX(total)
                           FROM (SELECT country,city, SUM(pop) AS total
                                 FROM population 
                                 GROUP BY country,city
                             ) as s
                           GROUP BY country
                           );

If two cities in the same country have the same highest total population you will get two cities for that country.

SqlFiddleDemo

Output:

╔══════════╦═════════╦════════╗
║ country  ║  city   ║ total  ║
╠══════════╬═════════╬════════╣
║ india    ║ delhi   ║  80000 ║
║ USA      ║ NewYork ║ 100000 ║
╚══════════╩═════════╩════════╝
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
1

You could use a combination of GROUP_CONCAT and FIND_IN_SET. This query will return a comma separated list of cities for every country, ordered by population DESC:

SELECT country, GROUP_CONCAT(city ORDER BY pop DESC) AS cities
FROM population
GROUP BY country

and it will return something like this:

| country |                   cities |
|---------|--------------------------|
|   india |    delhi,hyderabad,delhi |
|     USA | NewYok,california,NewYok |

then we can join this subquery back to the population table using FIND_IN_SET that returns the position of a city in the list of cities:

SELECT
  p.country,
  p.city,
  SUM(p.pop)
FROM
  population p INNER JOIN (
    SELECT country, GROUP_CONCAT(city ORDER BY pop DESC) AS cities
    FROM population
    GROUP BY country
  ) m ON p.country=m.country
         AND FIND_IN_SET(p.city, m.cities)=1
GROUP BY
  p.country,
  p.city

the join will succeed only on the city with the maximum population for every country: FIND_IN_SET(p.city, m.cities)=1.

This will work only if there's one city with the maximum poluation, if there are more only one will be returned. This also is not standard SQL and will only work on MySQL or similar, other DBMS have window functions that will make this same query easier to write.

fthiella
  • 48,073
  • 15
  • 90
  • 106
-1

The following answer is not correct as it uses a feature specific to Mysql which violates the ANSI standards. The result is not deterministic as it is not defined which city name will be returned when aggregating by country. Mostly it is the first entry which will be used, this is why sorting in the inner query makes this work in most cases. But beware: It is by definition not guaranteed to use the first city, hence there can be cases where this will output wrong results. Another case this answer does not cover, is when there are two cities with same population as max for a country. This solution will only output one city per country.

I would solve it with a inner subquery which gets all cities grouped and the outer filters only to get the largest by country.

SELECT 
  country, city, MAX(population_total) AS population_total
FROM 
  (
        SELECT country, city, SUM(population) AS population_total
        FROM tableName
        GROUP BY country, city
        ORDER BY population_total DESC
  ) AS t1
GROUP BY 
  country
JHoffmann
  • 722
  • 8
  • 17
  • 1
    City will be undeterministic – Lukasz Szozda Nov 21 '15 at 19:47
  • 1
    this returns an arbitrary city – Strawberry Nov 21 '15 at 19:47
  • @Strawberry Did you see, that i use SELECT MAX() in the outer query? – JHoffmann Nov 21 '15 at 19:49
  • 2
    http://stackoverflow.com/questions/33629168/group-by-clause-in-mysql-and-postgresql-why-the-error-in-postgresql/33629201#33629201 and learn about grouping without explicit agg function – Lukasz Szozda Nov 21 '15 at 19:50
  • Sry, forgot the sorting in the inner query. Correcting now – JHoffmann Nov 21 '15 at 19:51
  • @lad2025 Ok, now I understand why my answer is actually wrong (although it will still work in most cases, which makes it problematic). Thanks for pointing that out. Learned something new today. – JHoffmann Nov 21 '15 at 20:04
  • 2
    @JHoffmann Great to hear it. The point is that MySQL "abuses" aggregation compared to SQL ANSI standard. With explicit `order by` inside subquery you may get the first record per group but it is still not guaranteed. Plus your solution will not handle 2 cities in the same country with the same population (rare, but possible) – Lukasz Szozda Nov 21 '15 at 20:09