2

I am trying to create a new conditional column based on two conditionals. I want to find the mean of columns A, B, and C based on the name and weeks leading UP TO (but not including) the week in the corresponding row. Let us take Joe as an example. For row 1, we would have no data in the new columns. For row 2 would have the "mean" of week 1 data. for Row 3, we want the mean of week 1 and week 2 data. In practice, we can have more weeks than 3, but I wanted to simplify the example. I would like to avoid loops if possible.

This is fairly easy to do in Excel with the Averageifs() function but I want to automate/consolidate the process with R

my data looks something like this:

Name Week A B C
Joe  1    5 6 7
Joe  2    4 5 6
Joe  3    2 3 4
Tim  1    7 8 9
Tim  2    5 4 6 
Tim  4    3 5 4
Bob  1    9 8 7
Bob  3    8 5 2
Bob  4    4 5 3

The new data would look something like this:

    Name Week A B C    A_2 B_2 C_2
    Joe  1    5 6 7    NA  NA  NA
    Joe  2    4 5 6    5   6   7
    Joe  3    2 3 4    4.5 5.5 6.5
    Tim  1    7 8 9    NA  NA  NA
    Tim  2    5 4 6    7   8   9
    Tim  4    3 5 4    6   6   7.5
    Bob  1    9 8 7    NA  NA  NA
    Bob  3    8 5 2    9   8   7
    Bob  4    4 5 3    8.5 6.5 4.5

Thank you for any help you can provide! I am semi-new to R and have been struggling with this problem

Koala
  • 39
  • 4
  • [How to calculate the Cumulative Average for some numbers?](https://stackoverflow.com/questions/11074665/how-to-calculate-the-cumulative-average-for-some-numbers), e.g. `cumsum(x) / seq_along(x)` – Henrik Nov 14 '18 at 07:38

2 Answers2

0

Here's a way with dplyr package -

df %>%
  group_by(Name) %>% 
  arrange(Name, Week) %>% 
  mutate(
    A_2 = lag(cummean(A)),
    B_2 = lag(cummean(B)),
    C_2 = lag(cummean(C))
  ) %>% 
  ungroup()

# A tibble: 9 x 8
  Name   Week     A     B     C   A_2   B_2   C_2
  <fct> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Bob       1     9     8     7 NA    NA    NA   
2 Bob       3     8     5     2  9.00  8.00  7.00
3 Bob       4     4     5     3  8.50  6.50  4.50
4 Joe       1     5     6     7 NA    NA    NA   
5 Joe       2     4     5     6  5.00  6.00  7.00
6 Joe       3     2     3     4  4.50  5.50  6.50
7 Tim       1     7     8     9 NA    NA    NA   
8 Tim       2     5     4     6  7.00  8.00  9.00
9 Tim       4     3     5     4  6.00  6.00  7.50

Data -

df <- structure(list(Name = structure(c(2L, 2L, 2L, 3L, 3L, 3L, 1L, 
1L, 1L), .Label = c("Bob", "Joe", "Tim"), class = "factor"), 
    Week = c(1L, 2L, 3L, 1L, 2L, 4L, 1L, 3L, 4L), A = c(5L, 4L, 
    2L, 7L, 5L, 3L, 9L, 8L, 4L), B = c(6L, 5L, 3L, 8L, 4L, 5L, 
    8L, 5L, 5L), C = c(7L, 6L, 4L, 9L, 6L, 4L, 7L, 2L, 3L)), .Names = c("Name", 
"Week", "A", "B", "C"), class = "data.frame", row.names = c(NA, 
-9L))
Shree
  • 10,835
  • 1
  • 14
  • 36
0

A data.table approach:

library(data.table)

setDT(df)[order(Name, Week),][, `:=` (
                  A_mean = shift(cummean(A)),
                  B_mean = shift(cummean(B)),
                  C_mean = shift(cummean(C))
                  ), by = Name][]

Note that [] at the end is just for printing the result.

Output:

   Name Week A B C A_mean B_mean C_mean
1:  Bob    1 9 8 7     NA     NA     NA
2:  Bob    3 8 5 2    9.0    8.0    7.0
3:  Bob    4 4 5 3    8.5    6.5    4.5
4:  Joe    1 5 6 7     NA     NA     NA
5:  Joe    2 4 5 6    5.0    6.0    7.0
6:  Joe    3 2 3 4    4.5    5.5    6.5
7:  Tim    1 7 8 9     NA     NA     NA
8:  Tim    2 5 4 6    7.0    8.0    9.0
9:  Tim    4 3 5 4    6.0    6.0    7.5
arg0naut91
  • 14,574
  • 2
  • 17
  • 38