-1

To simplify a complex query I am working on, I feel like solving this is key.

I have the following table

id city Item
1 chicago 1
2 chicago 2
3 chicago 1
4 cedar 2
5 cedar 1
6 cedar 2
7 detroit 1

I am trying to find the ratio of number of rows grouped by city and item to the number of rows grouped by just the items for each and every unique city-item pair.

So I would like something like this

City Item groupCityItemCount groupItemCount Ratio
chicago 1 2 4 2/4
chicago 2 1 3 1/3
cedar 1 1 4 1/4
cedar 2 2 3 2/3
detroit 1 1 4 1/4

This is my current solution but its too slow.

Select city, item, (count(*) / (select count(*) from records t2 where t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

Also replaced where with groupBy and having but that is also slow.

Select city, item, (count(*) / (select count(*) from records t2 group by item having t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

(Note: I have removed column3 and column4 from the solution for smaller code)

(Edit: Typo as pointed out by xQbert and MatBailie)

philipxy
  • 14,867
  • 6
  • 39
  • 83
codeyashu
  • 55
  • 2
  • 7
  • 1
    Have you considered putting an index on city and/or item? – Peter Dongan Dec 08 '21 at 23:28
  • What is the "math" for the groupItemCounts... I think I get why Chicago item 1 has a count of 3 (Since there's 3 total chicago records) but why is there a 4 then? – xQbert Dec 09 '21 at 00:14
  • 1
    https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a4a04e32e6ce542d045a275daa946307 is what I have to play with... – xQbert Dec 09 '21 at 00:26

2 Answers2

2

Is it slow because it's evaluating each row separately with the subquery in the select statement? It may be operating as a correlated subquery.

If that's the case it might be faster if you get the values out of a join and go from there -

Select city, t1.item, (COUNT(t1.item) / MAX(t2.it_count)) AS pen_ratio
from records t1
JOIN (SELECT item, count(item) AS it_count
      FROM records
      group by item) t2
        ON t2.item = t1.item
GROUP BY city, t1.item

Updated some errors and included the fiddle based off the starting point from xQbert. I had to CAST as float in the fiddle, but you may not need to CAST and use the above query in yours depending on datatypes.

I believe this follows the intent of your original query.

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d77a715175159304b9192a16ad903347

procopypaster
  • 416
  • 1
  • 6
0

You can approach it in two parts.

First, aggregate to the level you're interested in, as normal.

Then, use analytical functions to work out subtotals across your partitions (item, in your case).

WITH
  aggregate AS
(
  SELECT
    city,
    item,
    COUNT(*) AS row_count
  FROM
    records
  GROUP BY
    city,
    item
)
SELECT
  city,
  item,
  row_count                               AS groupCityItemCount,
  SUM(row_count) OVER (PARTITION BY item) AS groupItemCount
FROM
  aggregate

Fiddle borrowed from xQbert

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Don't have much idea about this approach as fitting this in my existing query might get complex. I will try it though and analyze the prepared statements. – codeyashu Dec 09 '21 at 01:47