0

I have a data set where each record includes street, zipcode, city, population. I want to find (% of population one zipcode is to the entire city's population) but can't figure out how to group by zip and then add a column that shows total city's population for every zip with that same city.

[Ex Trying to find (Sum_of_City and Percent_of_City)]

Code I tried:

SELECT
   city,
   zipcode,
   sum(population) as Pop_Count_ (
   SELECT
      SUM(zSUM) 
   FROM
      (
         SELECT
            SUM(Distinct zipcode) AS zCount 
         FROM
            myTable 
         GROUP BY
            city
      )
      AS A)_ 
   FROM
      myTable_ 
   GROUP BY
      city,
      zipcode;
Vivek Jain
  • 2,730
  • 6
  • 12
  • 27
Verm
  • 3
  • 1

2 Answers2

1

Use window functions:

select city, zipcode, sum(population) as population,
       sum(population) / sum(sum(population)) over (partition by city) as zipcode_ratio
from t
group by city, zipcode;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Doesn't that assume MySQL8+? – MatBailie Sep 09 '20 at 18:12
  • @MatBailie . . . This assumes using a release of MySQL that has been released in that last 2.5 years. – Gordon Linoff Sep 09 '20 at 18:27
  • Yet many sites still don't make it available as part of their platform. It may not be under the OP's control. – MatBailie Sep 09 '20 at 18:28
  • Thanks @GordonLinoff, I do actually have the latest version of MySQL so Windows Functions work. I am wondering how this method works too if instead of sum(population) it would be (count(Distinct(neighborhood))... if i also had that field. (one zipcode to multiple neighborhoods) – Verm Sep 09 '20 at 19:13
  • @Verm . . . That depends what you want to do. Sample data and desired results would help. – Gordon Linoff Sep 09 '20 at 22:25
0

In case you're not on MySQL 8 (in which case you can't use Window Functions), here's an alternative approach... Aggregate twice, then Join...

SELECT
  city_zip.city, city_zip.zip_code, city_zip.popluation, city_zip.poupluation / city.population AS ratio
FROM
(
    SELECT city, zipcode, sum(population) AS population
      FROM myTable
  GROUP BY city, zip_code
)
  AS city_zip
INNER JOIN
(
    SELECT city, sum(population) AS population
      FROM myTable
  GROUP BY city
)
  AS city
    ON city.city = city_zip.city

Or, using a correlated-subquery...

SELECT
  city,
  zip_code,
  SUM(popluation),
  SUM(population)
  /
  (
    SELECT SUM(population)
      FROM myTable AS city
     WHERE city.city = city_zip.city
  )
FROM
  myTable AS city_zip
GROUP BY
  city,
  zip_code

I suspect the longer code might be faster, though I'm not entirely sure.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Thanks for this, I tried doing the correlated-subquery... it worked as you posted it but the second I tried putting in a "ORDER BY" statement, it timed out (1800s). @MatBailie – Verm Sep 09 '20 at 21:07
  • @Verm - Then try the aggregate and join approach, MySQL may handle correlated subqueries poorly, especially when re-ordering the datasets. `JOIN` happens before the `ORDER BY` and would likely be less adversly affected by the sorting. – MatBailie Sep 09 '20 at 21:10
  • @Verm - Though your other comments show you have MySQL, so use analytic / windowed functions. – MatBailie Sep 09 '20 at 21:11