2

Consider this simple query:

SELECT 
    year(t.date) AS y, 
    month(t.date) AS m, 
    t.person_id AS id, 
    count(*) AS freq
FROM 
    table t
WHERE 
    t.date>='2013-01-01' AND t.date<='2013-06-30'
GROUP BY y, m, id

This yields something like this:

y     m  id   freq
------------------
2013  1  100  2
2013  1  101  7
2013  1  102  1
2013  2  100  5
2013  2  101  4
2013  2  102  11
...

I would like to add a column that contains the percent of freq with respect to the sum of freq in each month, e.g.:

y     m  id   freq perc
-----------------------
2013  1  100  2    20
2013  1  101  7    70
2013  1  102  1    10
2013  2  100  5    25
2013  2  101  4    20
2013  2  102  11   55
...

My solution so far is to add this to the select clause:

count(*) * 100 / (SELECT count(*) FROM table WHERE year(date)=y AND month(date)=m) AS perc

But I would like to avoid a sub-select, especially since all necessary values are already present in the first query's result. This would require an aggregation of aggregated values, but I don't know how to do this. Any ideas?

thofou76
  • 77
  • 8

2 Answers2

0
SELECT 
    year(t.date) AS y, 
    month(t.date) AS m, 
    t.person_id AS id, 
    COUNT(*) AS freq,
    CASE WHEN p.freq = 0 then 0
         ELSE (CAST(COUNT(*) as FLOAT) / p.freq) * 100 
         END AS rate
FROM table t
JOIN (
    SELECT 
        year(t.date) AS y, 
        month(t.date) AS m,  
        count(*) AS freq
    FROM table
    GROUP BY y, m
) p ON p.y = year (t.date) AND p.m = month (t.date)
WHERE t.date BETWEEN '2013-01-01' AND '2013-06-30'
GROUP BY y, m, id
T I
  • 9,785
  • 4
  • 29
  • 51
  • The ON-clause must be rewritten (t.y and t.m are invalid), but that's the kind of two-stage-grouping I was looking for. – thofou76 Aug 15 '13 at 22:39
0

There is no way with MySQL to do this without the kind of sub-query you are proposing. But, to get the numbers to add to 100%, you should be careful to ensure that the denominator of the query contains a compatible WHERE clause to the numerator.

In your proposed solution, you would NOT get a total of 100%.

Using the query you proposed without the additional column,

SELECT 
    year(t.date) AS y, 
    month(t.date) AS m, 
    t.person_id AS id, 
    count(*) AS freq
FROM 
    table t
WHERE 
    t.date>='2013-01-01' AND t.date<='2013-06-30'
GROUP BY y, m, id

as the starting point, the final query you want would be:

SELECT 
    year(t.date) AS y, 
    month(t.date) AS m, 
    t.person_id AS id, 
    count(*) AS freq,
    (count(*) * 100) / 
        (
            select count(*) 
            from table t 
            where t.date>='2013-01-01' 
            AND t.date<='2013-06-30'
        )
FROM 
    table t
WHERE 
    t.date>='2013-01-01' AND t.date<='2013-06-30'
GROUP BY y, m, id

I propose this query with the known issue that the table (called table) will produce an error. In the little tests I did, I renamed it to t1. In general, calling a table 'table' is a non-portable thing.

amrith
  • 953
  • 6
  • 17