6

The function below groups values in a vector based on whether the cumulative sum has reached a certain max value and then starts over.

cs_group <- function(x, threshold) {
  cumsum <- 0
  group <- 1
  result <- numeric()
  for (i in 1:length(x)) {
    cumsum <- cumsum + x[i]
    if (cumsum > threshold) {
      group <- group + 1
      cumsum <- x[i]
    }
    result = c(result, group)
  }
  return (result)
}

Example

The max value in the example is 10. The first group only included 9; because summing it with the next value would result in a sum of 12. The next group includes 3, 2, 2 (+8 would result in a value higher then 10).

test <- c(9, 3, 2, 2, 8, 5, 4, 9, 1)
cs_group(test, 10)
[1] 1 2 2 2 3 4 4 5 5

However, I prefer to include in each group the value that results in the cumulative sum to be higher than the maximum value of 10.

Ideal result:

[1] 1 1 2 2 2 3 3 3 4

milan
  • 4,782
  • 2
  • 21
  • 39
  • `However, I prefer to include in each group the value that results in the cumulative sum to be higher than the maximum value of 10.` I don't understand this. Can you explain the ideal result some more. – user2974951 Feb 05 '20 at 11:43
  • shouldn't the last value be 4? – Ronak Shah Feb 05 '20 at 12:14
  • No, because of the restriction that the last group should be >= max. value and – milan Feb 05 '20 at 12:18
  • 1
    and what if the last value is 5 in this case i.e `> max.value*2` ? What would be expected output in this case? – Ronak Shah Feb 05 '20 at 12:21
  • You're right about that. That wouldn't work. So the value in test represents group 4. – milan Feb 05 '20 at 12:26

3 Answers3

5

You can write your own custom function or use the code written by others.

I had the exact same problem few days back and this has been included in the MESS package.

devtools::install_github("ekstroem/MESS")
MESS::cumsumbinning(test, 10, cutwhenpassed = TRUE)
#[1] 1 1 2 2 2 3 3 3 4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    It's a really neat function! I'm just curious that based on what does it determine that the last value belong to the same group? – tmfmnk Feb 05 '20 at 11:55
  • @tmfmnk I don't have my system right now to check this but I think the last value should be 4. – Ronak Shah Feb 05 '20 at 12:13
  • Maybe my interpretation is wrong, but I would think that the groups go as following: 1 = 9, 3 (sum = 12), 2 = 2, 2, 8 (sum = 12), 3 = 5, 4, 9 (sum = 18) and 4 = 1 (sum = 1). – tmfmnk Feb 05 '20 at 12:16
  • Exactly and this is what `cumsumbinning` does when `cutwhenpassed = TRUE`. – Ronak Shah Feb 05 '20 at 12:19
  • Sorry, I had the impression that the last value in your output was in the same group as the others before it :) +1. – tmfmnk Feb 05 '20 at 12:21
4

One purrr approach could be:

cumsum(c(FALSE, diff(accumulate(test, ~ ifelse(.x >= 10, .y, .x + .y))) <= 0))

[1] 0 0 1 1 1 2 2 2 3
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • This does not seem to work if the first value in test is 10 or more. Any ideas? – milan Feb 05 '20 at 13:34
  • Can you please specify what is the expected result for `test <- c(10, 9, 3, 2, 2, 8, 5, 4, 9, 1)`? To me, it seems to work. – tmfmnk Feb 05 '20 at 13:36
  • So that would be 1, 2, 2, 3, 3, 3, 4, 4, 4, 5. Thanks – milan Feb 05 '20 at 13:46
  • I got the same results, just starting with zero. Don't you too? Also, `identical(cumsum(c(FALSE, diff(accumulate(test, ~ ifelse(.x >= 10, .y, .x + .y))) < 0)) + 1, c(1, 2, 2, 3, 3, 3, 4, 4, 4, 5))` returns `TRUE`. – tmfmnk Feb 05 '20 at 13:57
  • I had to be more specific. What if the first two values are 10 or larger? Both then are place in the same group, which is not correct. – milan Feb 05 '20 at 14:34
2

For your purpose, your cs_group can be written like below (if I understand the logic behind in a correct way):

cs_group <- function(x, threshold) {
  group <- 1
  r <- c()
  repeat {
    if (length(x)==0) break
    cnt <- (idx <- max(which(cumsum(x) <= threshold)))+ifelse(idx==length(x),0,1)
    r <- c(r,rep(group, cnt))
    x <- x[-(1:cnt)]
    group <- group + 1
  }
  r
}

such that

test <- c(9, 3, 2, 2, 8, 5, 4, 9, 1)
> cs_group(test, 10)
[1] 1 1 2 2 2 3 3 3 4
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81