2

Consider the following data:

df <- data.frame(names = sample(letters[1:5], 20, replace = T), numbers = 1:20)

for which we have the function cumsum, such that for each row in a group, it computes the cumulative sum of all numbers up to that row

library(dplyr)
df %>%
  group_by(names) %>%
  mutate(cumsum_numbers = cumsum(numbers))

I wish to apply a general function my_fn cumulatively in the same manner as cumsum. my_fn has general form:

my_fn <- function(vector){
  # do stuff  to vector
  return(x) # x is a numeric scalar

}

that is, it takes a vector of previous values to the row, and returns a scalar.

The following code does not work:

df %>%
  group_by(names) %>%
  mutate(cumsum_numbers = my_fn(numbers)) # will apply my_fn 
                            # to each group in numbers, returning the 
                            # same value for each grouping level

So I guess I want something like:

df %>%
  group_by(names) %>%
  mutate(cumsum_numbers = cum_my_fn(numbers))

Note that an example function would be mean for calculating the cumulative mean. Interestingly dplyr has implemented cummean, but I don't know the internal workings of this so I can't work out how to implement this behaviour for a general function.

Alex
  • 15,186
  • 15
  • 73
  • 127
  • 2
    See `?Reduce`, and some examples [here](http://stackoverflow.com/questions/7413819/does-r-have-something-equivalent-to-reduce-in-python), [here](http://stackoverflow.com/questions/31467173/how-to-efficiently-implement-function-iteration-in-r), [here](http://stackoverflow.com/questions/25899621/accumulating-values-for-loop-to-reduce) – alexis_laz Jul 06 '16 at 11:57
  • That link is interesting, but `Reduce` requires a function with two inputs, whereas my function only has one input... – Alex Jul 06 '16 at 12:01
  • You could wrap your function in a binary one. E.g. assuming you have `my_fn = function(x) sum(x)`, then, `Reduce(function(x, y) my_fn(c(x, y)), 1:10, accumulate = TRUE)`. Depending on your actual case, you could, also, modify `my_fn` – alexis_laz Jul 06 '16 at 12:06
  • I am not sure this is going to work with `mean` though? – Alex Jul 06 '16 at 12:08
  • `dplyr::cummean` is a thing, if that's what you're trying to do. – alistaire Jul 06 '16 at 12:17
  • @Alex : For a cumulative mean specifically, you can use `cumsum() / seq_along()`; similarly more functions can be made cumulative without needing `Reduce` or looping. To specifically use `mean` in a recursive fashion you need to define a different function that utilizes the current mean (e.g. see the [article in Wikipedia](https://en.wikipedia.org/wiki/Moving_average#Cumulative_moving_average)). It all depends on the actual function you need to apply cumulatively. – alexis_laz Jul 06 '16 at 12:28
  • The point is that I want to apply a general function in this manner. I don't want to have to reconfigure each and every function I want to apply cumulatively in a special way. – Alex Jul 06 '16 at 20:07

0 Answers0