-1

Let's suppose this schema:

CREATE TABLE test
(
test_Id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
user_Id INT NOT NULL,
date DATE,
result VARCHAR(255) NOT NULL,
) engine=innodb;

My goal is to pick up the last 5 results as maximum for each different user_Id, ordered from newest to oldest. Besides that, depending on result column I want to calculate a ratio of those last results, to be able to pick up the 3 users with best ratio.

So let's take this data as example:

test_Id | user_Id | date      | result
1       | 1       |2016-09-05 | A
2       | 3       |2016-09-13 | A
3       | 3       |2016-09-30 | A
4       | 4       |2016-09-22 | A
5       | 4       |2016-09-11 | C
6       | 7       |2016-09-18 | D
7       | 4       |2016-09-08 | B
8       | 6       |2016-09-20 | E
9       | 7       |2016-09-16 | A
10      | 7       |2016-09-29 | E
11      | 7       |2016-09-23 | A
12      | 7       |2016-09-16 | B
13      | 4       |2016-09-15 | B
14      | 7       |2016-09-07 | C
15      | 7       |2016-09-09 | A
16      | 3       |2016-09-26 | A
17      | 4       |2016-09-11 | C
18      | 4       |2016-09-30 | E

What I have been able to achieve is this query:

SELECT p.user_Id, p.RowNumber, p.date, p.result, 
      SUM(CASE WHEN p.result='A' OR p.result='B' 
      THEN 1 ELSE 0 END) as avg
FROM (
    SELECT  @row_num := IF(@prev_value=user_Id,@row_num+1,1) 
          AS RowNumber, test_Id, user_Id, date, result, 
          @prev_value := user_Id
    FROM test,
    (SELECT @row_num := 1) x,
    (SELECT @prev_value := '') y
    WHERE @prev_value < 5
    ORDER BY user_Id, YEAR(date) DESC, MONTH(date) DESC, 
             DAY(date) DESC
) p
WHERE p.RowNumber <=10
GROUP BY p.user_Id, p.test_Id
ORDER BY p.user_Id, p.RowNumber;

This query provides me this kind of output:

 RowNumber |test_Id | user_Id | date      | result | avg
 1         | 1      | 1       |2016-09-05 | A      | 1
 1         | 3      | 3       |2016-09-30 | A      | 1
 2         | 16     | 3       |2016-09-26 | A      | 1
 3         | 2      | 3       |2016-09-13 | A      | 1
 1         | 18     | 4       |2016-09-30 | E      | 0
 2         | 4      | 4       |2016-09-22 | A      | 1
 3         | 13     | 4       |2016-09-15 | B      | 1
 4         | 5      | 4       |2016-09-11 | C      | 0
 5         | 17     | 4       |2016-09-11 | C      | 0
 1         | 8      | 6       |2016-09-20 | E      | 0
 1         | 10     | 7       |2016-09-29 | E      | 0
 2         | 11     | 7       |2016-09-23 | A      | 1
 3         | 6      | 7       |2016-09-18 | D      | 0
 4         | 9      | 7       |2016-09-16 | A      | 1
 5         | 12     | 7       |2016-09-16 | B      | 1

What I was expecting is that in the avg column would get the total of the results for each user that match the condition (A or B value), to be able to calculate a ratio from the 5 results for each user_id. (0, 0.2, 0.4, 0.6, 0.8, 1). Something like this:

 RowNumber |test_Id | user_Id | date      | result | avg
 1         | 1      | 1       |2016-09-05 | A      | 1
 1         | 3      | 3       |2016-09-30 | A      | 3
 2         | 16     | 3       |2016-09-26 | A      | 3
 3         | 2      | 3       |2016-09-13 | A      | 3
 1         | 18     | 4       |2016-09-30 | E      | 2
 2         | 4      | 4       |2016-09-22 | A      | 2
 3         | 13     | 4       |2016-09-15 | B      | 2
 4         | 5      | 4       |2016-09-11 | C      | 2
 5         | 17     | 4       |2016-09-11 | C      | 2
 1         | 8      | 6       |2016-09-20 | E      | 0
 1         | 10     | 7       |2016-09-29 | E      | 3
 2         | 11     | 7       |2016-09-23 | A      | 3
 3         | 6      | 7       |2016-09-18 | D      | 3
 4         | 9      | 7       |2016-09-16 | A      | 3
 5         | 12     | 7       |2016-09-16 | B      | 3

Am I being restricted by the GROUP BY p.user_Id, p.test_Id clause when doing the SUM? I tried the query with only user_Id as GROUP BY clause and only test_Id too as GROUP BY clause, without getting the expected results.

Dez
  • 5,702
  • 8
  • 42
  • 51
  • your group by has to be wrong, see http://stackoverflow.com/a/39551434 – Drew Sep 19 '16 at 03:06
  • Using MySQL 5.5 version. Maybe you were pointing right about being 'result' a non 'group by' column. Anyway I liked @P.Salmon approach. – Dez Sep 25 '16 at 22:42
  • You had a group by 2 columns while have 4 non-agg columns. That leads to junkie data results – Drew Sep 25 '16 at 22:45
  • 1
    Also, that solution below is not guaranteed to work. The manual states it for safety precedence [Here](http://stackoverflow.com/a/22388942/1816093) and one often reads the Obligatory bible on it from Baron [Here](http://www.xaprb.com/blog/2006/12/15/advanced-mysql-user-variable-techniques/) before putting such a thing into production – Drew Sep 25 '16 at 22:51

1 Answers1

1

I think you need to calculate the avg and then join

select  a.rn,a.test_id,a.user_id,a.date,a.result,u.avg from
(
select      t1.*
        ,   if (t1.user_id <> @p, @rn:=1,@rn:=@rn+1) rn
        ,   @p:=t1.user_id p
from        (select @rn:=0, @p:='') rn,test t1
order   by t1.user_id, t1.date desc
) a 
join
(
select  s.user_id
            , sum(case when s.result  = 'A' or s.result = 'B' then 1 else 0 end) as avg
from
(
select      t1.*
        ,   if (t1.user_id <> @p, @rn:=1,@rn:=@rn+1) rn
        ,   @p:=t1.user_id p

from        (select @rn:=0, @p:='') rn,test t1
order   by t1.user_id, t1.date desc
) s
where s.rn <= 5
group   by s.user_id 
) u on u.user_id = a.user_id
where   a.rn <= 5
P.Salmon
  • 17,104
  • 2
  • 12
  • 19
  • This approach worked for my goal and quite good performance wise. I made some modifications for getting the ratio and order results depending on it to fulfill my needs. Thank you so much. – Dez Sep 20 '16 at 00:20