4

I can't get my head around the following problem.

Assuming the follwoing data:

library(tidyverse)
df <- tibble(source = c("A", "A", "B", "B", "B", "C"),
             value  = c(5, 10, NA, NA, NA, 20),
             add    = c(1, 1, 1, 2, 3, 4))

What I want to do is: for all cases where source == "B", I want to calculate the cumulative sum of the previous row's value and the current row's add. Of course, for the first "B" row, I need to provide a starting value for value. Note: in this case, it would be fine if we just take the value from the last "A" row.

So for row 3, the result would be 10 + 1 = 11.
For row 4, the result would be 11 + 2 = 13.
For row 5, the results would be 13 + 3 = 16.

I tried to use purrr::accumulate, but I failed in many different ways, e.g. I thought I can do:

df %>%
  mutate(test = accumulate(add, .init = 10, ~.x + .y))

But this leads to error:

Error: Problem with `mutate()` column `test`.
i `test = accumulate(add, .init = 10, ~.x + .y)`.
i `test` must be size 6 or 1, not 7.

Same if I use .init = value

And I also didn't manage to do the job only on group B (although this is probably no issue, I think I can probably performa on the full data frame and then just replace values for all non-B rows).

Expected output:

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA
Maël
  • 45,206
  • 3
  • 29
  • 67
deschen
  • 10,012
  • 3
  • 27
  • 50
  • 1
    Are there multiple runs of Bs? – tmfmnk Jan 14 '22 at 12:24
  • No, just the three consecutive ones like in the example. – deschen Jan 14 '22 at 12:29
  • 3
    Then even `df %>% mutate(test = lag(value)) %>% group_by(source) %>% mutate(test = first(test) + cumsum(add))` would work. – tmfmnk Jan 14 '22 at 12:32
  • oh my, this looks much easier. Happy to upvote if you provide an answer. Note that in the lag function, you'd need to add a default value for the first row, otherwise the result will be NA everywhere. – deschen Jan 14 '22 at 12:40

2 Answers2

3

You were essentially in the right direction. Since you provide an .init value to accumulate, the resulting vector is of size n+1, with the first value being .init. You have to remove the first value to get a vector that fit to your column size.

Then, if you want NAs on the remaining values, here's a way to do it. Also, since the "starting row" is the third, .init has to be set to 8.

df %>%
  mutate(test = 
           ifelse(source == "B", accumulate(add, .init = 8, ~.x + .y)[-1], NA))

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    Great answer. Thanks. Need to contemplate a bit about the manual setting of the init value in my rel-life example, but that's gonna work. I'll probably first group by source, then set the first value per group to my desired value and then run `.init = first(value)`, – deschen Jan 14 '22 at 13:37
0

@tmfmnk provided an awesome answer and they deserve full credit (NOT ME)

Below is the same code from their comment (for more visibility, while also setting an initial value)

init_value = 10
df = df %>%     
  mutate(test = lag(value)) %>%  
  group_by(source) %>%    
  mutate(test = init_value + cumsum(add))
thehand0
  • 1,123
  • 4
  • 14