2

I have the following dataframe

id = 1:16
vals = c(0,1,1,1,0,0,0,0,1,1,1,0,0,0,1,0)
cumsum  = c(0, 1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 1, 0)
test = data.frame(id,vals, cumsum)

I would like to extract the maximum of test$cumsum for each consecutive sequence. For instance, I can slice the column cumsum of test such that i have the consecutive sequences :

S1 = {0}
S2 = {1,2,3}
S3 = {0,0,0,0}
S4 = {1,2,3}
S5 = {0,0,0}
S6 = {1} 
S7 = {0}

As you can see, the zeros slice out my column into different sequences. What i want to return, is the maximum of each non-zero sequence. So I would get

returned_vector <- c(3,3,1)

Where the first entry of the returned_vector is the maximum of S2 (the first non-zero sequence), the second entry of the returned_vector is the maximum of S4 (the second non-zero sequence), the third entry of of the returned_vector is the maximum of S6 (the final non-zero sequence)

I am not sure how I can do it. Basically I just want to return the maximum of all non-zero sequences in my column test$cumsum.

Any help appreciated!

Thanks a lot!

Lola1993
  • 151
  • 6
  • 3
    `rle` could work: `r = rle(vals)`; `r$lengths[r$values == 1]` – Henrik Aug 16 '21 at 19:36
  • If I understand you correctly, this is answered here: [How do I calculate the length of consecutive runs of events, e.g. wins, visits, in R](https://stackoverflow.com/questions/2968575/how-do-i-calculate-the-length-of-consecutive-runs-of-events-e-g-wins-visits); [How can I count runs in a sequence?](https://stackoverflow.com/questions/1502910/how-can-i-count-runs-in-a-sequence) – Henrik Aug 16 '21 at 19:40
  • Thanks Henrik! This also works!! – Lola1993 Aug 16 '21 at 19:47
  • @Lola1993: Please don't accept my answer. Its just *one* way to do it, and as you can see in Henrik's comment, there are much shorter, faster base R solutions. – TimTeaFan Aug 16 '21 at 19:50
  • @Henrik. This is fantastic to know. I really had a hard time to get it to work with `dplyr`. :-) – TarJae Aug 16 '21 at 20:18

2 Answers2

0

Here is one way to do it:

library(dplyr)

test %>% 
  group_by(id = data.table::rleid(vals)) %>% 
  summarise(max = ifelse(sum(vals) != 0,
                         list(max(cumsum, na.rm = TRUE)),
                         list(NULL))
            ) %>% 
  pull(max) %>%
  unlist

#> [1] 3 3 1

# the data
id = 1:16
vals = c(0,1,1,1,0,0,0,0,1,1,1,0,0,0,1,0)
cumsum  = c(0, 1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 1, 0)
test = data.frame(id,vals, cumsum)

Created on 2021-08-16 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
0

Here is a dplyr solution:

  1. Create unique group_id depending on vals
  2. add a column my.sequence with sequences within group_id
  3. filter and summarise
  4. then get the vector my_result
library(dplyr)

test <- test %>% 
    mutate(
        group_id = cumsum(vals != lag(vals, def = first(vals)))
    ) %>% 
    group_by(group_id) %>% 
    mutate(my.sequence = row_number()) %>% 
    filter(vals ==1) %>% 
    summarise(result = max(my.sequence)) 

my_result <- test$result
my_result

Output:

[1] 3 3 1
TarJae
  • 72,363
  • 6
  • 19
  • 66