1

I have a data frame containing dates and for each date the number of events that took place. From this I add a field telling me if the number of events was above average or not.

Date Events Above Average
01/01 7 0
02/01 8 1
03/01 8 1
04/01 6 0
05/01 8 1
06/01 9 1
07/01 4 0
08/01 7 0

From this, if I perform an RLE I get

Count Value
1 FALSE
2 TRUE
1 FALSE
2 TRUE
2 FALSE

How can I use this information to add an addition field as below to my original data frame:

Date Events Above Average Run Above Av
01/01 7 0 0
02/01 8 1 2
03/01 8 1 2
04/01 6 0 0
05/01 8 1 2
06/01 9 1 2
07/01 4 0 0
08/01 7 0 0
user2974951
  • 9,535
  • 1
  • 17
  • 24
  • Can you show us the code which produced the rle table? And what exactly does "Run Above Av" represent? – user2974951 Jun 22 '22 at 08:06
  • Hi Andrew, just a question. Why do you create a criteria for a column that is already the information/criteria? I mean `Above average` has the same information as `Run Above Av`. – Stephan Jun 22 '22 at 08:10
  • @user2974951 - the code is simple rle(df$Events) Run Above Av - if the current row is above average, it gives the number of contiguous rows that are above average and of which the current row is a member (hope that makes sense) – Andrew Akester Jun 22 '22 at 08:13
  • @Stephan Above average and Run Above Av will not be the same. If the current row is part of a contiguous range of rows that are all above average, Run Above Av will give the number of contiguous rows. I will be using this field to filter to just rows which are part of runs of 7 or more all above average in order to create an SPC Chart. – Andrew Akester Jun 22 '22 at 08:15
  • So there will be occasions that eg event 8 in your example will have a 0 assigend to above average on eg 9/1 or whatever? – Stephan Jun 22 '22 at 08:17
  • Something does not add up, that code does not produce your intermediate table, for one Value column has booleans when it should have events integers, second it has 3 sequences of length 2 while your original table has only 1 such sequence? – user2974951 Jun 22 '22 at 08:17
  • @user2974951 You are correct, the code for the rle should have been rle(df$Events > Mean) The original table did indeed have three sequences of length 2, it's just that one of them was a sequence of below averages. – Andrew Akester Jun 23 '22 at 10:47

1 Answers1

1

You seem to be looking for the rle lengths, each repeated by itself, then multiplied by the sign of the Above Average column

library(dplyr)

df %>%
  mutate(`Run Above Av` = rep(rle(`Above Average`)$lengths,
            times = rle(`Above Average`)$lengths) * sign(`Above Average`))
#>    Date Events Above Average Run Above Av
#> 1 01/01      7             0            0
#> 2 02/01      8             1            2
#> 3 03/01      8             1            2
#> 4 04/01      6             0            0
#> 5 05/01      8             1            2
#> 6 06/01      9             1            2
#> 7 07/01      4             0            0
#> 8 08/01      7             0            0

Data from question in reproducible format

df <- structure(list(Date = c("01/01", "02/01", "03/01", "04/01", "05/01", 
"06/01", "07/01", "08/01"), Events = c(7L, 8L, 8L, 6L, 8L, 9L, 
4L, 7L), `Above Average` = c(0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L)), 
class = "data.frame", row.names = c(NA, -8L))

Created on 2022-06-22 by the reprex package (v2.0.1)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87