1

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.

Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2

Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
      JOIN Courses
      WHERE activities.encodedid=? AND activities.activitydate BETWEEN
          DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND 
          DATE_ADD(Courses.Startsemester, INTERVAL $x DAY) 
      ORDER BY activities.steps DESC LIMIT 5
      ) a
GROUP BY a.encodedid

Here's the same query with the values filled in for testing:

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
      JOIN Courses
      WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
          DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND 
          DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY) 
      ORDER BY activities.steps DESC LIMIT 5
      ) a
GROUP BY a.encodedid
Dharman
  • 30,962
  • 25
  • 85
  • 135
Tim M
  • 306
  • 3
  • 18
  • When does the 7 day period start? – Strawberry Aug 16 '18 at 18:51
  • 2018-07-12 - Startsemester in the Courses table. – Tim M Aug 16 '18 at 18:53
  • In looking at your SQL Fiddle, Courses is empty. Not sure why. I don't see any errors in your schema, but ```SELECT * FROM Courses``` returns 0 rows. – Sloan Thrasher Aug 16 '18 at 19:08
  • @SloanThrasher I've got the Courses table and data showing in the SQL Fiddle. Thanks. – Tim M Aug 16 '18 at 19:25
  • Just tried the SQL Fiddle link in your question again, and added ```SELECT * FROM Courses;``` at the beginning, and it return zero rows. – Sloan Thrasher Aug 16 '18 at 19:56
  • @SloanThrasher I didn't update the SQL Fiddle url. It's updated now and tested as working. – Tim M Aug 16 '18 at 20:01
  • 2
    One reason you're getting incorrect results is that each activity row is being counted multiple times because they match multiple rows in the Courses table. You might consider returning only the DISTINCT dates from the Courses table. – Sloan Thrasher Aug 16 '18 at 20:11
  • Here's one idea. If I have time I'll write it up as a proper answer... http://sqlfiddle.com/#!9/ee46d7/11 – Strawberry Aug 16 '18 at 22:36

3 Answers3

2

As @SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
    FROM (SELECT * FROM activities
          JOIN (SELECT DISTINCT Startsemester FROM Courses) c
          WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
              DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND 
              DATE_ADD(c.Startsemester, INTERVAL 6 DAY) 
          ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
          ) a
    GROUP BY a.encodedid

Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.

StepsTotal  AVGSteps
37553       12517.666666666666

Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change

(SELECT DISTINCT Startsemester FROM Courses)

to

(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thank you Nick. If I change query to calculate 7 - 13 days, the result of your query is 7273.2 average steps. But when I take the top five days and average them in Excel, I get an average of 10293 steps. Any ideas on that? My other question is I'd like to take your suggestion and expand on it by adding the CourseNumber, Semester and SectionNumber but when I add those to the query, it gives a blank result (SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED 1164' AND SectionNumber = '5075' AND Semester = 'Fall') . http://sqlfiddle.com/#!9/5e69bdf/29 – Tim M Sep 24 '18 at 01:50
  • Hi Tim, SQLFiddle isn't responding for me at the moment, I'll take a look at it when it comes back. – Nick Sep 24 '18 at 01:58
  • Hi Nick, I just checked and it seems to be back :) – Tim M Sep 25 '18 at 03:42
  • Hi @TimM seems to be down again! Any chance you could post on [rextester](http://rextester.com) instead? – Nick Sep 26 '18 at 07:11
  • Hi Nick, I posted it here - https://www.db-fiddle.com/f/vFjNnPgTj6qeXGebTGMCiS/0 Thanks! – Tim M Sep 26 '18 at 16:55
  • Hi Tim, really sorry about not getting back to looking at this. It seems the `ORDER BY` clause is not working properly, I'm taking a look at it. In the meantime you really shouldn't accept it as an answer though as it doesn't work. – Nick Oct 03 '18 at 07:30
  • 1
    Hi Tim, the problem is that your steps column is of type varchar, not int. So it sorts as strings, with '3402' > '10700'. The fix is to cast the steps column as `UNSIGNED`, so that it sorts properly. See my edit, and [this fiddle](http://sqlfiddle.com/#!9/5e69bdf/37) – Nick Oct 03 '18 at 07:44
1

Try this query:

SELECT @rn := 1, @weekAndYear := 0;

SELECT weekDayAndYear,
       SUM(steps),
       AVG(steps)
FROM (
  SELECT @weekAndYear weekAndYearLag,
         CASE WHEN @weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
           THEN @rn := @rn + 1 ELSE @rn := 1 END rn,
         @weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
         steps,
         lightly_act_min,   
         fairly_act_min,
         sed_act_min,
         vact_min,
         encodedid,
         activitydate,
         username
  FROM activities
  ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear

Demo

With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • Thank you Michal! Couple of questions... 1. I would expect to see 5 weeks worth of data even though the 5th week is only 4 days worth of data because the end of the semester is today and the tracking of steps comes to an end. 2. The best way to convert the date back to YYYY-MM-DD would be in PHP or in the MYSQL query? Thank you so much!! – Tim M Aug 16 '18 at 19:32
  • @TimM I don't understand first question. Second question: do as you want :) it is opinion based and I would do this in MySql as I don't knie PHP. If my answer helped you you should accept it (green check mark on the left) and optionally upvote :) – Michał Turczyn Aug 16 '18 at 19:40
  • The output of your example has 4 weeks worth of data. There is a 5th week which is this current week. I would need it to display the average steps for the last partial week. Any suggestions on how to achieve that? – Tim M Aug 16 '18 at 19:42
  • Another question. The first 7 days have the following step counts: 16058 11218 10277 10324 10151 3402 1730 Taking the average of the top 5 days is 11605.6. In your demo, it comes out as 10238.3. I'm not sure why the difference? – Tim M Aug 16 '18 at 19:54
  • 1
    @TimM The issue was that `steps` is `varchar` - you have to choose appropriate datatypes for your column, in MySQL `USNIGNED` is integer datatype, you might want to consider it. Try updated answer :) – Michał Turczyn Aug 17 '18 at 09:03
0

Consider the following:

 DROP TABLE IF EXISTS my_table;

CREATE TABLE `my_table` 
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);

insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
    , sum(steps) total
    from (
      select steps
    , case when @prev = grp 
           then @j:=@j+1 else @j:=1 end j
    , @prev:=grp prev
    from (SELECT steps 
               , case when mod(@i,3)=0 
                      then @grp := @grp+1 else @grp:=@grp end grp -- a 3 day week
               , @i:=@i+1 i
            from my_table
               , (select @i:=0,@grp:=0) vars
           order 
              by id) x
, (select @prev:= null, @j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;

+------+-------+
| prev | total |
+------+-------+
| 1    |    16 |
| 2    |    15 |
| 3    |     4 |
+------+-------+

http://sqlfiddle.com/#!9/ee46d7/11

Strawberry
  • 33,750
  • 13
  • 40
  • 57