Creating a NEW multi-conditional (function) column in R

Question

I am trying to create a new conditional column based on two conditionals. I want to find the mean of columns A, B, and C based on the name and weeks leading UP TO (but not including) the week in the corresponding row. Let us take Joe as an example. For row 1, we would have no data in the new columns. For row 2 would have the "mean" of week 1 data. for Row 3, we want the mean of week 1 and week 2 data. In practice, we can have more weeks than 3, but I wanted to simplify the example. I would like to avoid loops if possible.

This is fairly easy to do in Excel with the Averageifs() function but I want to automate/consolidate the process with R

my data looks something like this:

Name Week A B C
Joe  1    5 6 7
Joe  2    4 5 6
Joe  3    2 3 4
Tim  1    7 8 9
Tim  2    5 4 6 
Tim  4    3 5 4
Bob  1    9 8 7
Bob  3    8 5 2
Bob  4    4 5 3

The new data would look something like this:

    Name Week A B C    A_2 B_2 C_2
    Joe  1    5 6 7    NA  NA  NA
    Joe  2    4 5 6    5   6   7
    Joe  3    2 3 4    4.5 5.5 6.5
    Tim  1    7 8 9    NA  NA  NA
    Tim  2    5 4 6    7   8   9
    Tim  4    3 5 4    6   6   7.5
    Bob  1    9 8 7    NA  NA  NA
    Bob  3    8 5 2    9   8   7
    Bob  4    4 5 3    8.5 6.5 4.5

Thank you for any help you can provide! I am semi-new to R and have been struggling with this problem

[How to calculate the Cumulative Average for some numbers?](https://stackoverflow.com/questions/11074665/how-to-calculate-the-cumulative-average-for-some-numbers), e.g. `cumsum(x) / seq_along(x)` — Henrik, Nov 14 '18 at 07:38

score 0 · Accepted Answer · answered Nov 13 '18 at 15:05

Here's a way with dplyr package -

df %>%
  group_by(Name) %>% 
  arrange(Name, Week) %>% 
  mutate(
    A_2 = lag(cummean(A)),
    B_2 = lag(cummean(B)),
    C_2 = lag(cummean(C))
  ) %>% 
  ungroup()

# A tibble: 9 x 8
  Name   Week     A     B     C   A_2   B_2   C_2
  <fct> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Bob       1     9     8     7 NA    NA    NA   
2 Bob       3     8     5     2  9.00  8.00  7.00
3 Bob       4     4     5     3  8.50  6.50  4.50
4 Joe       1     5     6     7 NA    NA    NA   
5 Joe       2     4     5     6  5.00  6.00  7.00
6 Joe       3     2     3     4  4.50  5.50  6.50
7 Tim       1     7     8     9 NA    NA    NA   
8 Tim       2     5     4     6  7.00  8.00  9.00
9 Tim       4     3     5     4  6.00  6.00  7.50

Data -

df <- structure(list(Name = structure(c(2L, 2L, 2L, 3L, 3L, 3L, 1L, 
1L, 1L), .Label = c("Bob", "Joe", "Tim"), class = "factor"), 
    Week = c(1L, 2L, 3L, 1L, 2L, 4L, 1L, 3L, 4L), A = c(5L, 4L, 
    2L, 7L, 5L, 3L, 9L, 8L, 4L), B = c(6L, 5L, 3L, 8L, 4L, 5L, 
    8L, 5L, 5L), C = c(7L, 6L, 4L, 9L, 6L, 4L, 7L, 2L, 3L)), .Names = c("Name", 
"Week", "A", "B", "C"), class = "data.frame", row.names = c(NA, 
-9L))

Would you happen to know a way doing this with Standard deviation as well? Essentially, a "cumulative standard deviation" — Koala, Nov 13 '18 at 16:20
use this function - `cumsd <- function(x) { sapply(1:length(x), function(a) sd(x[1:a])) }` — Shree, Nov 13 '18 at 16:28
One more question, what if I wanted the last 3 (or n) weeks? and not all weeks leading up to the current week? — Koala, Nov 17 '18 at 04:48

arg0naut91 · Answer 2 · 2018-11-13T15:40:46.223

A data.table approach:

library(data.table)

setDT(df)[order(Name, Week),][, `:=` (
                  A_mean = shift(cummean(A)),
                  B_mean = shift(cummean(B)),
                  C_mean = shift(cummean(C))
                  ), by = Name][]

Note that [] at the end is just for printing the result.

Output:

   Name Week A B C A_mean B_mean C_mean
1:  Bob    1 9 8 7     NA     NA     NA
2:  Bob    3 8 5 2    9.0    8.0    7.0
3:  Bob    4 4 5 3    8.5    6.5    4.5
4:  Joe    1 5 6 7     NA     NA     NA
5:  Joe    2 4 5 6    5.0    6.0    7.0
6:  Joe    3 2 3 4    4.5    5.5    6.5
7:  Tim    1 7 8 9     NA     NA     NA
8:  Tim    2 5 4 6    7.0    8.0    9.0
9:  Tim    4 3 5 4    6.0    6.0    7.5

Creating a *NEW* multi-conditional (function) column in R

2 Answers2

Creating a NEW multi-conditional (function) column in R