-2

Based on the dataframe below, I would like to create a new column using rollmean based on three conditions - the values in column b match each other, the minimum values to be averaged in column a is 2, and I only want to average all values below the current row. If the amount of values to average is 2 or less, I would like to return a blank value.

I'm assuming that I will have to use an apply function to do this, but I'm not sure where to start.

a=c(1,2,3,4,1,2,3,4,1,2,3,4)
b=c("X","X","X","X","Y","Y","Y","Y","Z","Z","Z","Z")
df=as.data.frame(cbind(a,b))

I would like the final table to look like:

Name    Value   Output
X        1         2.5
X        2         3
X        3  
X        4  
Y        1         2.5
Y        2         3
Y        3  
Y        4  
Z        1         2.5
Z        2         3
Z        3  
Z        4  
Axeman
  • 32,068
  • 8
  • 81
  • 94
Dan L
  • 57
  • 5
  • 1
    Could you explain your condition with an example above? For example, how in above dataframe value 2.5 came at 1st row or why is it blank at 3rd row? – Ronak Shah Feb 20 '17 at 16:26
  • The output value in row 1 is the average of all of the following rows with name =X ,which is the average of 1,2,3,4. In row 2, the output is the average of 2,3,4 or 3. In the third row, there is not a sufficient number of values to average (less than 3), so the output returns nothing. – Dan L Feb 20 '17 at 16:33

2 Answers2

3

A simple tidverse solution. Within each group, if there is more than two items left, take the mean from the current index (row_number()) up to the final index (n()).

library(tidyverse)
df %>% 
  group_by(b) %>% 
  mutate(Output = map_dbl(row_number(), ~ifelse(n() - . < 3, NA, mean(a[.:n()]))))

Data

The way you create your data coerces b into a character vector (because cbind makes a matrix).

Use simply:

a <- c(1,2,3,4,1,2,3,4,1,2,3,4)
b <- c("X","X","X","X","Y","Y","Y","Y","Z","Z","Z","Z")
df <- data.frame(a, b)

Or

df <- data.frame(a = 1:4, b = rep(c('X', 'Y', 'Z'), each = 4))
Axeman
  • 32,068
  • 8
  • 81
  • 94
  • If I wanted to apply this same idea to a larger data set, and take the rolling mean of 500 items with a new minimum of 150, then I know I can change the < 3 to <150, but how could I change the above equation to only take the mean of the following 500 items as opposed to all remaining items? Hope that makes sense. Thanks – Dan L Feb 20 '17 at 16:49
  • Either `~ifelse(n() - . < 150, NA, mean(a[.:(. + 500)], na.rm = TRUE))` or `~ifelse(n() - . < 150, NA, mean(a[.:min(. + 500, n())]))` should work. – Axeman Feb 20 '17 at 16:59
0

Note that there is an error in the formation of df in the question so we have modified it below. We can use ave like this. No packages are used.

df <- data.frame(a, b)
fun <- function(x) if (length(x) <= 2) NA else rev(cumsum(rev(x)) / c(NA, NA, 3:length(x)))
transform(df, Output = ave(a, b, FUN = fun))

giving:

   a b Output
1  1 X    2.5
2  2 X    3.0
3  3 X     NA
4  4 X     NA
5  1 Y    2.5
6  2 Y    3.0
7  3 Y     NA
8  4 Y     NA
9  1 Z    2.5
10 2 Z    3.0
11 3 Z     NA
12 4 Z     NA
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341