6

I'm trying to create a dplyr pipeline to filter on

Imagine a data frame jobs, where I want to filter out the most-senior positions from the titles column:

titles

Chief Executive Officer
Chief Financial Officer
Chief Technical Officer
Manager
Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

R code for filtering them out (up to 'Manager') would be...

jobs %>% 
filter(!str_detect(title, 'Chief')) %>%
filter(!str_detect(title, 'Manager')) ...

but I want to still keep "Program Manager" in the final filtering to produce a new data frame with all of the "lower level jobs" like

Product Manager
Programmer
Scientist
Marketer
Lawyer
Secretary

Is there a way to specify the str_detect() filter on a given value EXCEPT for one particular string?

Assume that the data frame's column has 1000s of roles, with various string combinations including "Manager," but there will always be a filter on a specific exception.

alxlvt
  • 675
  • 2
  • 10
  • 18
  • 1
    Why not simply use anchors to "Manager" like so `... %>% filter(!stringr::str_detect(title, "^Chief|^Manager$))`. The `^` anchor tells the regex to match strings starting with "Manager". The other anchor `$` ensures that the string must also end with "Manager". – JdeMello Jan 13 '19 at 01:34

1 Answers1

10

Or you could have a separate filter for "Product Manager"

library(tidyverse)

jobs %>% 
filter((!str_detect(title, "Chief|Manager")) | str_detect(title, "Product Manager"))


#            title
#1 Product Manager
#2      Programmer
#3       Scientist
#4        Marketer
#5          Lawyer
#6       Secretary

which can be also twisted in base R using grepl/grep

jobs[c(grep("Product Manager",jobs$title), 
       grep("Chief|Manager", jobs$title, invert = TRUE)),, drop = FALSE]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213