I just need to delete, by group, rows that contain the same value 3 times in a row and leading rows that are less than previous rows. I've tried dplyr
's lead
function, but have been unsuccessful. The example data are just a subset of thousands of rows where a conditional will work best.
Data Before
print(df)
sample measurement
1 a 0.1443
2 a 0.2220
3 a 0.3330
4 a 0.9435
5 a 0.8051 # Delete sample "a" as value is less than previous
6 b 0.1554
7 b 0.2775
8 b 0.3885
9 b 1.2210
10 b 1.8093
11 c 0.0000
12 c 0.0000
13 c 0.0000 # Delete sample "c" as there a 3 consecutive values in a row
14 c 0.0333
15 c 0.2997
Desired Output
sample measurement
1 b 0.1554
2 b 0.2775
3 b 0.3885
4 b 1.2210
5 b 1.8093
Failed Attempts
Here, I tried to just filter any leading measurement that was greater or equal to the previous row, but failed.
df %>%
group_by(sample) %>%
filter(!any(lead(measurement) <= measurement)) %>%
ungroup()
# A tibble: 0 x 2
# ... with 2 variables: sample <chr>, measurement <dbl>
If I tried to just collect the rows that fit the condition the above code does what is intended. I'm sure there's a better way to do this.
df %>%
group_by(sample) %>%
filter(any(lead(measurement) <= measurement)) %>%
ungroup()
sample measurement
<chr> <dbl>
1 a 0.144
2 a 0.222
3 a 0.333
4 a 0.944
5 a 0.805
6 c 0
7 c 0
8 c 0
9 c 0.0333
10 c 0.300
Reproducible Code
structure(list(sample = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b", "c", "c", "c", "c", "c"), measurement = c(0.1443,
0.222, 0.333, 0.9435, 0.8051, 0.1554, 0.2775, 0.3885, 1.221,
1.8093, 0, 0, 0, 0.0333, 0.2997)), row.names = c(NA, -15L), na.action = structure(c(`1` = 1L,
`2` = 2L, `3` = 3L, `5` = 5L, `6` = 6L, `8` = 8L, `9` = 9L, `11` = 11L,
`12` = 12L, `15` = 15L, `16` = 16L, `17` = 17L, `18` = 18L, `20` = 20L,
`21` = 21L, `23` = 23L, `24` = 24L, `26` = 26L, `27` = 27L, `30` = 30L,
`31` = 31L, `32` = 32L, `33` = 33L, `35` = 35L, `36` = 36L, `38` = 38L,
`39` = 39L, `41` = 41L, `42` = 42L, `45` = 45L), class = "omit"), class = "data.frame")