Time Series / Tidyverse: Calculus Depending of All Previous Rows in a Given Group

Question

Objective

Given those data:

df <-
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L), time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L), val = c(56, 72, 91, 2, 76, 48, 8, 86,
49, 85, 62, 24, 3, 51, 81)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L))

# A tibble: 15 x 3
      id  time   val
   <int> <int> <dbl>
 1     1     1    56
 2     1     2    72
 3     1     3    91
 4     1     4     2
 5     1     5    76
 6     2     1    48
 7     2     2     8
 8     2     3    86
 9     2     4    49
10     2     5    85
11     3     1    62
12     3     2    24
13     3     3     3
14     3     4    51
15     3     5    81

I want to create a new column which is TRUE if val has ever been above 60 at any previous time.

So the expected result should be:

# A tibble: 15 x 3
      id  time   val   ever
   <int> <int> <dbl>  <lgl>
 1     1     1    56  FALSE
 2     1     2    72   TRUE
 3     1     3    91   TRUE
 4     1     4     2   TRUE
 5     1     5    76   TRUE
 6     2     1    48  FALSE
 7     2     2     8  FALSE
 8     2     3    86   TRUE
 9     2     4    49   TRUE
10     2     5    85   TRUE
11     3     1    62   TRUE
12     3     2    24   TRUE
13     3     3     3   TRUE
14     3     4    51   TRUE
15     3     5    81   TRUE

What I have tried:

Some variations around:

(
  df
  %>% mutate(high = val > 60)
  %>% group_by(id)
  %>% mutate(ever = F)
  %>% mutate(ever = high || lag(ever))
)

But the purpose of the lag() function is not to use its last result to compute the next one...

score 1 · Accepted Answer · answered Apr 26 '20 at 01:38

You can use cumsum() applied to val > 60 like this:

library(tidyverse)

df <- tibble(
  id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L),
  time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
  val = c(56, 72, 91, 2, 76, 48, 8, 86, 49, 85, 62, 24, 3, 51, 81)
)

df %>%
  group_by(id) %>%
  mutate(
    high = cumsum(val > 60) > 0
  ) %>%
  ungroup()
#> # A tibble: 15 x 4
#>       id  time   val high 
#>    <int> <int> <dbl> <lgl>
#>  1     1     1    56 FALSE
#>  2     1     2    72 TRUE 
#>  3     1     3    91 TRUE 
#>  4     1     4     2 TRUE 
#>  5     1     5    76 TRUE 
#>  6     2     1    48 FALSE
#>  7     2     2     8 FALSE
#>  8     2     3    86 TRUE 
#>  9     2     4    49 TRUE 
#> 10     2     5    85 TRUE 
#> 11     3     1    62 TRUE 
#> 12     3     2    24 TRUE 
#> 13     3     3     3 TRUE 
#> 14     3     4    51 TRUE 
#> 15     3     5    81 TRUE

^{Created on 2020-04-26 by the reprex package (v0.3.0)}

pietrodito · Answer 2 · 2020-04-25T18:14:32.437

0

I did it but the hard and the ugly way. I am pretty sure there a better:

(
  df
  %>% group_split(id)
  %>% map_df(function(df) {
    df$ever <- NA
    for(i in seq_len(nrow(df))) {
      if( i == 1 ) df$ever[i] <- df$val[i] > 60
      else df$ever[i] <- df$ever[i-1] || df$val[i] > 60 
    }
    df})
)

Edit

I have found a better way, (and my question was a duplicate) here

edited Apr 25 '20 at 18:14

answered Apr 25 '20 at 16:37

pietrodito

1,783
15
24

Time Series / Tidyverse: Calculus Depending of All Previous Rows in a Given Group

Objective

What I have tried:

2 Answers2

Edit