3

I am looking to extract or filter rows that contain a specific value together with a number of rows that precedes the row with the specific value.

In the example below I wish to keep the row where Y == "HIT" and I also wish to keep the four rows that precedes it. To be specific, I wish to extract rows 6 to 10.

I have written some code that does the job, but I am convinced there must be a better solution. Can you help me find it? Base R or dplyr solutions are preferred.

Thanks in advance.

library(tidyverse)

# Data
df <- tibble(X = 1:11,
       Y = c(rep("MISS", 9), "HIT", "MISS"))

df
#> # A tibble: 11 x 2
#>        X Y    
#>    <int> <chr>
#>  1     1 MISS 
#>  2     2 MISS 
#>  3     3 MISS 
#>  4     4 MISS 
#>  5     5 MISS 
#>  6     6 MISS 
#>  7     7 MISS 
#>  8     8 MISS 
#>  9     9 MISS 
#> 10    10 HIT  
#> 11    11 MISS
# I want to keep the the row with the value "HIT" in column Y and the 4 rows that preecedes it.

#My code monster that kind of works
df_monster <- df %>% 
  mutate(
    Z = if_else(Y == "HIT", 5, 0),
    Z = case_when(
      Z > 0 ~ Z,
      lead(Z) > 0 ~ lead(Z)-1,
      TRUE ~ 0)
  ) %>% 
  mutate(
    Z = case_when(
      Z > 0 ~ Z,
      lead(Z) > 0 ~ lead(Z)-1,
      TRUE ~ 0)
  ) %>% 
  mutate(
    Z = case_when(
      Z > 0 ~ Z,
      lead(Z) > 0 ~ lead(Z)-1,
      TRUE ~ 0)
  ) %>% 
  mutate(
    Z = case_when(
      Z > 0 ~ Z,
      lead(Z) > 0 ~ lead(Z)-1,
      TRUE ~ 0)
  ) 

#Desired result
df_monster %>% 
  filter(Z > 0)
#> # A tibble: 5 x 3
#>       X Y         Z
#>   <int> <chr> <dbl>
#> 1     6 MISS      1
#> 2     7 MISS      2
#> 3     8 MISS      3
#> 4     9 MISS      4
#> 5    10 HIT       5

Created on 2021-08-17 by the reprex package (v2.0.0)

Steen Harsted
  • 1,802
  • 2
  • 21
  • 34
  • 1
    Related, with some other alternatives: [Returning above and below rows of specific rows in r dataframe](https://stackoverflow.com/questions/13155609/returning-above-and-below-rows-of-specific-rows-in-r-dataframe) – Henrik Aug 17 '21 at 14:51

6 Answers6

3

For these kind of complex row-based conditions I always find it easier to explicit the computation.

hits <- which(df$Y=="HIT")
df %>% 
  mutate(myfilter = sapply(1:nrow(df), function(x) any(hits-x >= 0 & hits-x <5))) %>% 
  filter(myfilter)

# A tibble: 5 x 3
      X Y     myfilter
  <int> <chr> <lgl>   
1     6 MISS  TRUE    
2     7 MISS  TRUE    
3     8 MISS  TRUE    
4     9 MISS  TRUE    
5    10 HIT   TRUE  
gaut
  • 5,771
  • 1
  • 14
  • 45
3

One option might be to use filter:

n <- 5
df[rev(tail(stats::filter(c(rep(FALSE, n), rev(df$Y == "HIT")), rep(1, n), side=1), -n) > 0),]
#      X Y    
#  <int> <chr>
#1     6 MISS 
#2     7 MISS 
#3     8 MISS 
#4     9 MISS 
#5    10 HIT

This will also work for overlapping hits:

df$Y[9] <- "HIT"
n <- 5
df[rev(tail(stats::filter(c(rep(FALSE, n), rev(df$Y == "HIT")), rep(1, n), side=1), -n) > 0),]
# A tibble: 6 × 2
#      X Y    
#  <int> <chr>
#1     5 MISS 
#2     6 MISS 
#3     7 MISS 
#4     8 MISS 
#5     9 HIT  
#6    10 HIT
GKi
  • 37,245
  • 2
  • 26
  • 48
2

A base R option :

n <- which(df$Y == 'HIT')
df[unique(c(sapply(n, `+`, -4:0))), ]

#      X Y    
#  <int> <chr>
#1     6 MISS 
#2     7 MISS 
#3     8 MISS 
#4     9 MISS 
#5    10 HIT  

In case 'HIT' is present in first 3 rows of the data the above will error out since it will generate negative numbers. In which case, you can use -

df[Filter(function(x) x > 0, unique(c(sapply(n, `+`, -4:0)))), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

Here is a dplyr solution, but it's inferior to the base R solutions shown:

df %>% 
  arrange(desc(X)) %>% 
  group_by(grp = cumsum(Y == "HIT")) %>% 
  filter(grp == 1) %>% 
  slice_head(n = 5) %>% 
  arrange(X) %>%
  ungroup()

returns

# A tibble: 5 x 3
      X Y       grp
  <int> <chr> <int>
1     6 MISS      1
2     7 MISS      1
3     8 MISS      1
4     9 MISS      1
5    10 HIT       1
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
2

Here's a dplyr solution:

First, define a function to retrieve the indices of the rows you want to select:

Sequence <- function(pattern, column) {
  inds <- which(grepl(pattern, column, perl = TRUE))
  sort(unique(c(inds -4, inds - 3, inds - 2, inds -1, inds))) 
}

Second, apply the function to your data using slice:

library(dplyr)   
df %>% 
  slice(Sequence(
    pattern = "HIT",
    column = Y))
# A tibble: 5 x 2
      X Y    
  <int> <chr>
1     6 MISS 
2     7 MISS 
3     8 MISS 
4     9 MISS 
5    10 HIT 
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

I think this works? Might not be the most flexible approach

#base R
df[which(df$Y == "HIT") - (4:0), ]
# # A tibble: 5 x 2
#       X Y    
#   <int> <chr>
# 1     6 MISS 
# 2     7 MISS 
# 3     8 MISS 
# 4     9 MISS 
# 5    10 HIT 
user63230
  • 4,095
  • 21
  • 43