6

I fully anticipate getting slammed for a duplicate question, but I just couldn't find a similar question. Apologies in advance.

I am trying to clean some data that sometimes contains a summary row and sometimes does not. here is a small reproducible example:

library(tidyverse)

yr <- c(2010, 2010, 2010,
        2011, 2011, 2011, 2011,
        2012, 2012, 2012)

a <- c("HAY", "APPLES", "PUMPKINS",
       "HAY", "HAY & HAYLAGE", "APPLES", "PUMPKINS",
       "HAY & HAYLAGE", "APPLES", "PUMPKINS")

b <- c(1:10)

dat <- as_tibble(list(yr = yr, a = a, b = b))

dat %>% 
  group_by(yr) %>% 
  filter(a != "HAY" if group contains a== "HAY & HAYLAGE")

obviously, that last line of code is pseudo code. In group for yr = 2011 I want to filter out the row where a equals "HAY". My resulting tibble should have 9 rows.

jkgrain
  • 769
  • 5
  • 20

1 Answers1

8

Here's one way to do it -- you can just use an if statement inside a filter condition:

library(dplyr) 

# (data from OP) 
dat <- dplyr::tibble(
  yr = c(2010, 2010, 2010, 2011, 2011, 
         2011, 2011, 2012, 2012, 2012),
  a = c("HAY", "APPLES", "PUMPKINS", "HAY", "HAY & HAYLAGE", 
        "APPLES", "PUMPKINS", "HAY & HAYLAGE", "APPLES", "PUMPKINS"), 
  b = 1:10
)


dat %>% 
  group_by(yr) %>% 
  filter(if ('HAY & HAYLAGE' %in% a) a!='HAY' else TRUE) %>% 
  ungroup()

## result will be: 
## 
## # A tibble: 9 x 3
##      yr a                 b
##   <dbl> <chr>         <int>
## 1  2010 HAY               1
## 2  2010 APPLES            2
## 3  2010 PUMPKINS          3
## 4  2011 HAY & HAYLAGE     5
## 5  2011 APPLES            6
## 6  2011 PUMPKINS          7
## 7  2012 HAY & HAYLAGE     8
## 8  2012 APPLES            9
## 9  2012 PUMPKINS         10
lefft
  • 2,065
  • 13
  • 20