7

Given the following table structures:

countries: id, name
regions: id, country_id, name, population
cities: id, region_id, name

...and this query...

SELECT c.name AS country, COUNT(DISTINCT r.id) AS regions, COUNT(s.id) AS cities
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN cities AS s ON s.region_id = r.id
GROUP BY c.id

How would I add a SUM of the regions.population value to calculate the country's population? I need to only use the value of each region once when summing, but the un-grouped result has multiple rows for each region (the number of cities in that region).

Example data:

mysql> SELECT * FROM countries;
+----+-----------+
| id | name      |
+----+-----------+
|  1 | country 1 |
|  2 | country 2 |
+----+-----------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM regions;
+----+------------+-----------------------+------------+
| id | country_id | name                  | population |
+----+------------+-----------------------+------------+
| 11 |          1 | region 1 in country 1 |         10 |
| 12 |          1 | region 2 in country 1 |         15 |
| 21 |          2 | region 1 in country 2 |         25 |
+----+------------+-----------------------+------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM cities;
+-----+-----------+---------------------------------+
| id  | region_id | name                            |
+-----+-----------+---------------------------------+
| 111 |        11 | City 1 in region 1 in country 1 |
| 112 |        11 | City 2 in region 1 in country 1 |
| 121 |        12 | City 1 in region 2 in country 1 |
| 211 |        21 | City 1 in region 1 in country 2 |
+-----+-----------+---------------------------------+
4 rows in set (0.00 sec)

Desired output with example data:

+-----------+---------+--------+------------+
| country   | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 |       2 |      3 |         25 |
| country 2 |       1 |      1 |         25 |
+-----------+---------+--------+------------+

I prefer a solution that doesn't require changing the JOIN logic.

The accepted solution for this post seems to be in the neighborhood of what I'm looking for, but I haven't been able to figure out how to apply it to my issue.


MY SOLUTION

SELECT c.id AS country_id,
    c.name AS country,
    COUNT(x.region_id) AS regions,
    SUM(x.population) AS population,
    SUM(x.cities) AS cities
FROM countries AS c
LEFT JOIN (
        SELECT r.country_id,
            r.id AS region_id,
            r.population AS population,
            COUNT(s.id) AS cities
        FROM regions AS r
        LEFT JOIN cities AS s ON s.region_id = r.id
        GROUP BY r.country_id, r.id, r.population
    ) AS x ON x.country_id = c.id
GROUP BY c.id, c.name

Note: My actual query is much more complex and has nothing to do with countries, regions, or cities. This is a minimal example to illustrate my issue.

Community
  • 1
  • 1
Sonny
  • 8,204
  • 7
  • 63
  • 134

7 Answers7

6

First of all, the other post you reference is not the same situation. In that case, the joins are like [A -> B and A -> C], so the weighted average (which is what that calculation does) is correct. In your case the joins are like [A -> B -> C], so you need a different approach.

The simplest solution that comes to mind right away does involve a subquery, but not a complex one:

SELECT 
    c.name AS country, 
    COUNT(r.id) AS regions, 
    SUM(s.city_count) AS cities,
    SUM(r.population) as population
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN 
    (select region_id, count(*) as city_count
    from cities 
    group by region_id) AS s
ON s.region_id = r.id
GROUP BY c.id

The reason this works is that it resolves the cities to one row per region before joining to the region, thus eliminating the cross join situation.

Community
  • 1
  • 1
radshop
  • 602
  • 5
  • 19
4

How about leaving the rest and just adding one more join for the population

SELECT c.name AS country, 
       COUNT(distinct r.id) AS regions, 
       COUNT(s.id) AS cities, 
       pop_regs.sum as total_population
FROM countries AS c
LEFT JOIN regions AS r ON r.country_id = c.id
LEFT JOIN cities AS s ON s.region_id = r.id
left join 
(
    select country_id, sum(population) as sum 
    from regions 
    group by country_id
) pop_regs on pop_regs.country_id = c.id
GROUP BY c.id, c.name

SQLFiddle demo

juergen d
  • 201,996
  • 37
  • 293
  • 362
3

To start, you should know that the question and it's solution mentioned in your question are a little bit different from your question and it's solution. That's why you can not use only JOINs without sub-queries.

Tables :

Countries :

===========================
|     id     |    name    |
===========================
|     1      | country 1  |
---------------------------
|     2      | country 2  |
---------------------------
|     3      | country 3  |
---------------------------
|     4      | country 4  |
---------------------------

Regions :

=============================================
|    id    |country_id|   name   |population|
=============================================
|    1     |    1     | c1 - r1  |    10    |
---------------------------------------------
|    2     |    1     | c1 - r2  |    15    |
---------------------------------------------
|    3     |    1     | c1 - r3  |    15    |
---------------------------------------------
|    4     |    2     | c2 - r1  |    25    |
---------------------------------------------
|    5     |    3     | c3 - r1  |    13    |
---------------------------------------------

Cities :

========================================
|     id     | region_id  |    name    |
========================================
|     1      |     1      |   city 1   |
----------------------------------------
|     2      |     1      |   city 2   |
----------------------------------------
|     3      |     2      |   city 3   |
----------------------------------------
|     4      |     2      |   city 4   |
----------------------------------------
|     5      |     2      |   city 5   |
----------------------------------------
|     6      |     3      |   city 6   |
----------------------------------------
|     7      |     3      |   city 7   |
----------------------------------------
|     8      |     4      |   city 8   |
----------------------------------------
|     9      |     4      |   city 9   |
----------------------------------------
|     10     |     4      |  city 10   |
----------------------------------------

As a simple method, you can join countries table with a sub-query that joins regions and cities tables to get 2 tables : countries and regions with cities columns :

SQL :

SELECT
    r.id AS id,
    r.country_id AS country_id,
    r.name AS name,
    r.population AS population,
    COUNT(s.region_id) AS cities
FROM regions r
    /* we use left joint and not only join to get also regions without cities */
    LEFT JOIN cities s
        ON r.id = s.region_id
GROUP BY r.id

Data :

==================================================================
|     id     | country_id |    name    | population |   cities   |
==================================================================
|     1      |     1      |  c1 - r1   |     10     |     2      |
------------------------------------------------------------------
|     2      |     1      |  c1 - r2   |     15     |     3      |
------------------------------------------------------------------
|     3      |     1      |  c1 - r3   |     15     |     2      |
------------------------------------------------------------------
|     4      |     2      |  c2 - r1   |     25     |     3      |
------------------------------------------------------------------
|     5      |     3      |  c3 - r1   |     13     |     0      |
------------------------------------------------------------------  

Then you have to do your normal requet which gives you this code :

SQL :

SELECT
    c.name AS country,
    COUNT(r.country_id) AS regions,
    /* ifnull is used here to show 0 instead of null */
    SUM(IFNULL(r.cities, 0)) AS cities,
    SUM(IFNULL(r.population, 0)) AS population
FROM countries c
    /* we use left joint and not only join to get also countries without regions */
    LEFT JOIN (
        SELECT
            /* we don't need regions.id and regions.name */
            r.country_id AS country_id,
            r.population AS population,
            COUNT(s.region_id) AS cities
        FROM regions r
            LEFT JOIN cities s
                ON r.id = s.region_id
        GROUP BY r.id
    ) r
    ON c.id = r.country_id
GROUP BY c.id

And this result :

=====================================================
|  country   |  regions   |   cities   | population |
=====================================================
| country 1  |     3      |     7      |     40     |
-----------------------------------------------------
| country 2  |     1      |     3      |     25     |
-----------------------------------------------------
| country 3  |     1      |     0      |     13     |
-----------------------------------------------------
| country 4  |     0      |     0      |     0      |
-----------------------------------------------------   

To compare, using only JOIN removes countries without regions and countries with regions that haven't cities :

=====================================================
|  country   |  regions   |   cities   | population |
=====================================================
| country 1  |     3      |     7      |     40     |
-----------------------------------------------------
| country 2  |     1      |     3      |     25     |
-----------------------------------------------------

For your exact example (with data mentioned in your question), you will get :

=====================================================
|  country   |  regions   |   cities   | population |
=====================================================
| country 1  |     2      |     3      |     25     |
-----------------------------------------------------
| country 2  |     1      |     1      |     25     |
-----------------------------------------------------

I hope all that can help you to get what you want.

akmozo
  • 9,829
  • 3
  • 28
  • 44
2

Use LEFT OUTER JOIN instead of INNER JOIN because If country have no regions then that country will not come in result if you use INNER JOIN, same wat If any regions have no cities then that will not counted in result.

So use LEFT OUTER JOIN instead of INNER JOIN or JOIN.

Try this:

SELECT c.name AS country, r.regions, r.population, r.cities 
FROM countries AS c 
LEFT OUTER JOIN (SELECT r.country_id, 
                        COUNT(r.id) AS regions, 
                        SUM(r.population) AS population, 
                        SUM(c.cities) AS cities
                 FROM regions AS r 
                 LEFT OUTER JOIN (SELECT c.region_id, COUNT(c.id) AS cities 
                                  FROM cities AS C
                                  GROUP BY c.region_id
                                 ) AS c ON r.id = c.region_id 
                 GROUP BY r.country_id
                ) AS r ON c.id = r.country_id;

Check the SQL FIDDLE DEMO

OUTPUT

| COUNTRY | REGIONS | POPULATION | CITIES |
|---------|---------|------------|--------|
|     usa |       3 |         16 |      4 |
| germany |       2 |          5 |      1 |
Saharsh Shah
  • 28,687
  • 8
  • 48
  • 83
  • 1
    Thanks for discussing the difference between `LEFT` and `INNER` joins. I was already aware, but this information is good for future readers. – Sonny Dec 23 '14 at 13:34
2

I have test in sql with this query for the same table you provide below

select regioncount.name as country,regioncount.regions, citycount.cities,regioncount.population    from
 (SELECT c.name,c.id,COUNT(r.id) AS regions ,SUM(r.population) as population
FROM countries AS c
JOIN regions AS r  on c.id = r.country_id GROUP BY c.id,c.name) as regioncount

join

(SELECT 
r.country_id,
    COUNT(s.id) AS cities 
FROM regions AS r
JOIN cities AS s  on r.id =s.region_id GROUP BY r.country_id) as citycount on citycount.country_id = regioncount.id

and i got the result u want

+-----------+---------+--------+------------+
| country   | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 |       2 |      3 |         25 |
| country 2 |       1 |      1 |         25 |
+-----------+---------+--------+------------+
1

Here's another way of doing it, if you dont want to introduce/change a JOIN or a SUBQUERY

SELECT 
  c.name AS country, 
  COUNT(distinct r.id) AS regions, 
  COUNT(s.id) AS cities,
  SUM(DISTINCT(((((r.id*r.id) + (r.population*r.id)))-(r.id*r.id))/r.id)) as total_population
FROM 
  countries AS c
  JOIN regions AS r ON r.country_id = c.id
  LEFT JOIN cities AS s ON s.region_id = r.id
GROUP 
  BY c.id

http://sqlfiddle.com/#!2/3dd8ba/22/0

Akash
  • 4,956
  • 11
  • 42
  • 70
1

Your problem is quite common. You join all tables that have something to do with the data you want to see, and then you start thinking about how to get to that data. When it comes to different aggregations as in your case, this is not easy to achieve.

So better join what you are actually interested in. In your case: countries and (aggregated) region/city data per country. This keeps the query straight-forward and easy to maintain.

select 
  c.name as country, 
  r.regions, 
  r.population,
  r.cities
from countries as c
join 
(
  select 
    country_id,
    count(*) as regions,
    sum(population) as population,
    sum((select count(*) from cities where cities.region_id = regions.id)) as cities
  from regions
  group by country_id
) as r on r.country_id = c.id;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73