Optimal SQL to perform multiple aggregate functions with different group by fields

Question

To simplify a complex query I am working on, I feel like solving this is key.

I have the following table

id	city	Item
1	chicago	1
2	chicago	2
3	chicago	1
4	cedar	2
5	cedar	1
6	cedar	2
7	detroit	1

I am trying to find the ratio of number of rows grouped by city and item to the number of rows grouped by just the items for each and every unique city-item pair.

So I would like something like this

City	Item	groupCityItemCount	groupItemCount	Ratio
chicago	1	2	4	2/4
chicago	2	1	3	1/3
cedar	1	1	4	1/4
cedar	2	2	3	2/3
detroit	1	1	4	1/4

This is my current solution but its too slow.

Select city, item, (count(*) / (select count(*) from records t2 where t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

Also replaced where with groupBy and having but that is also slow.

Select city, item, (count(*) / (select count(*) from records t2 group by item having t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

(Note: I have removed column3 and column4 from the solution for smaller code)

(Edit: Typo as pointed out by xQbert and MatBailie)

What is the "math" for the groupItemCounts... I think I get why Chicago item 1 has a count of 3 (Since there's 3 total chicago records) but why is there a 4 then? — xQbert, Dec 09 '21 at 00:14
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a4a04e32e6ce542d045a275daa946307 is what I have to play with... — xQbert, Dec 09 '21 at 00:26

procopypaster · Accepted Answer · 2021-12-09T01:49:22.883

Is it slow because it's evaluating each row separately with the subquery in the select statement? It may be operating as a correlated subquery.

If that's the case it might be faster if you get the values out of a join and go from there -

Select city, t1.item, (COUNT(t1.item) / MAX(t2.it_count)) AS pen_ratio
from records t1
JOIN (SELECT item, count(item) AS it_count
      FROM records
      group by item) t2
        ON t2.item = t1.item
GROUP BY city, t1.item

Updated some errors and included the fiddle based off the starting point from xQbert. I had to CAST as float in the fiddle, but you may not need to CAST and use the above query in yours depending on datatypes.

I believe this follows the intent of your original query.

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d77a715175159304b9192a16ad903347

MatBailie · Answer 2 · 2021-12-09T00:28:54.090

0

You can approach it in two parts.

First, aggregate to the level you're interested in, as normal.

Then, use analytical functions to work out subtotals across your partitions (item, in your case).

WITH
  aggregate AS
(
  SELECT
    city,
    item,
    COUNT(*) AS row_count
  FROM
    records
  GROUP BY
    city,
    item
)
SELECT
  city,
  item,
  row_count                               AS groupCityItemCount,
  SUM(row_count) OVER (PARTITION BY item) AS groupItemCount
FROM
  aggregate

Fiddle borrowed from xQbert

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=730146262267412522f6e27796151f43

edited Dec 09 '21 at 00:28

answered Dec 09 '21 at 00:22

MatBailie

83,401
18
103
137

Don't have much idea about this approach as fitting this in my existing query might get complex. I will try it though and analyze the prepared statements. – codeyashu Dec 09 '21 at 01:47

Optimal SQL to perform multiple aggregate functions with different group by fields

2 Answers2