1

My dataset has as features: players IDs, team, weeks and points.

I want to calculate the mean of TEAM points for previous weeks, but not all past weeks, just to the last 5 or less (if the current week is smaller than 5).

Example: For team = A, week = 7, the result will be the average of POINTS for team = A and weeks 2, 3, 4, 5 and 6.

The dataset can be created using the following code:

# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
week<-1:15
team<-c(rep("A",30),rep("B",30))
points<-round(runif(60,1,10),0) 
mydata<- data.frame(player_id=player_id,team=team,week=rep(week,4),points)

I would like to have a solution without a heavy looping, because the dataset is huge.

I have done related questions here that maybe will help, but I could not adapt to this case.

Question 1

Question 2

Thank you!

2 Answers2

1

We adapt the approach from my answer to one of your other questions if you want a dplyr solution:

library(dplyr)
library(zoo)
# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
week<-1:15
team<-c(rep("A",30),rep("B",30))
points<-round(runif(60,1,10),0) 
mydata<- data.frame(player_id=player_id,team=team,week=rep(week,4),points)

roll_mean <- function(x, k) {
    result <- rollapplyr(x, k, mean, partial=TRUE, na.rm=TRUE)
    result[is.nan(result)] <- NA
    return( result )
}

It might first be easier to aggregate by team:

team_data <- mydata %>%
    select(-player_id) %>%
    group_by(team, week) %>%
    arrange(week) %>%
    summarise(team_points = sum(points)) %>%
    mutate(rolling_team_mean = roll_mean(lag(team_points), k=5)) %>%
    arrange(team)

team_data

# A tibble: 30 x 4
# Groups:   team [2]
     team  week team_points rolling_team_mean
   <fctr> <int>       <dbl>             <dbl>
 1      A     1          13                NA
 2      A     2          11             13.00
 3      A     3           6             12.00
 4      A     4          13             10.00
 5      A     5          19             10.75
 6      A     6          10             12.40
 7      A     7          13             11.80
 8      A     8          16             12.20
 9      A     9          16             14.20
10      A    10          12             14.80
# ... with 20 more rows

Then, if you like we can put everything back together:

mydata <- inner_join(mydata, team_data) %>%
    arrange(week, team, player_id)

mydata[1:12, ]

   player_id team week points team_points rolling_team_mean
1          1    A    1      4          13                NA
2          2    A    1      9          13                NA
3          3    B    1     10          12                NA
4          4    B    1      2          12                NA
5          1    A    2      8          11                13
6          2    A    2      3          11                13
7          3    B    2      9          12                12
8          4    B    2      3          12                12
9          1    A    3      5           6                12
10         2    A    3      1           6                12
11         3    B    3      7          12                12
12         4    B    3      5          12                12
duckmayr
  • 16,303
  • 3
  • 35
  • 53
1

Here's one way:

# compute points per team per week
pts <- with(mydata, tapply(points, list(team, week), sum, default = 0))
pts
#   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
#A 13 11  6 13 19 10 13 16 16 12 17 11 13 10  4
#B 12 12 12 11 10  6 13 11  6  9  5  7 13 13  6

# compute the 5-week averages
sapply(setNames(seq(2, ncol(pts)), seq(2, ncol(pts))),
       function(i) {
           apply(pts[, seq(max(1, i - 5), i - 1), drop = FALSE], 1, mean)
       })
#   2  3  4     5    6    7    8    9   10   11   12   13   14   15
#A 13 12 10 10.75 12.4 11.8 12.2 14.2 14.8 13.4 14.8 14.4 13.8 12.6
#B 12 12 12 11.75 11.4 10.2 10.4 10.2  9.2  9.0  8.8  7.6  8.0  9.4

This will give the wrong result if the week variable has gaps.

Ernest A
  • 7,526
  • 8
  • 34
  • 40