-1

I need to run a script in R that updates values in given rows regarding the history of previous rows. More specifically, I want to calculate averages on some given columns over time. I'll explain.

Say, I have the following table:

Key  A   B   C      Date
  X  1   0   2  1/1/2017
  Y  2   1   1  2/1/2017
  X  2   1   3  3/1/2017
  X  6   2   2  4/1/2017
  Y  3   2   1  5/1/2017

I need to update each line, in order that each of the values from columns A, B and C represent the average of the keys X and Y, up until the given date.

So in this case, we would have something like:

Key    A    B    C      Date
  X    1    0    2  1/1/2017
  Y    2    1    1  2/1/2017
  X  1.5  0.5  2.5  3/1/2017
  X    3    1  2.3  4/1/2017
  Y  2.5  1.5    1  5/1/2017

I'm fairly new to R, and I keep ending up with solutions that look too much like procedural code, and don't seem to fit well. Usually in R there is an known solution that I am unaware of, so if there is any lib of native method for doing this type of conditional updates, any help is welcome. You can also assume that the rows are already ordered by date, in case it helps.

Thanks in advance!

Edit: Fixed wrong example number on row 4, column C.

htaunay
  • 73
  • 1
  • 2
  • 9
  • 1
    What do you mean average of the keys? Do you just want a moving average of each column? Should it only be the average of rows with X in Key up until the given date? Do you want to use previous averages in your new average? or should it be calculated from the original table? – Bishops_Guest Mar 08 '17 at 23:39
  • 1
    Shouldn't the value 1.6 be 2.3333? - (2+3+2)/3 = 2.33 ? – thelatemail Mar 09 '17 at 00:05
  • @thelatemail You're absolutely right. Sorry for the bad example, I already fixed it. – htaunay Mar 09 '17 at 13:42
  • @Bishops_Guest I was interested in a moving average of each column, ordered by the Date column, grouped by a given Key. @thelatemail's answer solved it. Updated the question to include the `moving-average` label, wasn't aware of the term. – htaunay Mar 09 '17 at 17:19

2 Answers2

4

Doing some groupings on each variable using cumsum(x)/seq_along(x) to get a cumulative mean.

dat[c("A","B","C")] <- lapply(
  dat[c("A","B","C")], function(x) ave(x, dat$Key, FUN=function(v) cumsum(v)/seq_along(v) ) 
)
dat

#  Key   A   B        C     Date
#1   X 1.0 0.0 2.000000 1/1/2017
#2   Y 2.0 1.0 1.000000 2/1/2017
#3   X 1.5 0.5 2.500000 3/1/2017
#4   X 3.0 1.0 2.333333 4/1/2017
#5   Y 2.5 1.5 1.000000 5/1/2017
thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

Here is an option with cummean from dplyr

library(dplyr)
dat %>% 
   group_by(Key) %>% 
   mutate_at(vars(A, B, C), cummean)
#   Key     A     B        C     Date
#  <chr> <dbl> <dbl>    <dbl>    <chr>
#1     X   1.0   0.0 2.000000 1/1/2017
#2     Y   2.0   1.0 1.000000 2/1/2017
#3     X   1.5   0.5 2.500000 3/1/2017
#4     X   3.0   1.0 2.333333 4/1/2017
#5     Y   2.5   1.5 1.000000 5/1/2017
akrun
  • 874,273
  • 37
  • 540
  • 662