41

I have a simple data frame that looks like this:

x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
df = data.frame(x, y)    

    x   y
1   aa  101
2   aa  102
3   aa  113
4   bb  201
5   cc  202
6   cc  344
7   cc  407

I would like to use a dplyr::filter() and a RegEx to filter out all the y observations that start with the number 1

I'm imagining that the code will look something like this:

df %>%
  filter(y != grep("^1")) 

But I am getting an Error in grep("^1") : argument "x" is missing, with no default

emehex
  • 9,874
  • 10
  • 54
  • 100

2 Answers2

57

You need to double check the documentations for grepl and filter.

For grep/grepl you have to also supply the vector that you want to check in (y in this case) and filter takes a logical vector (i.e. you need to use grepl). If you want to supply an index vector (from grep) you can use slice instead.

df %>% filter(!grepl("^1", y))

Or with an index derived from grep:

df %>% slice(grep("^1", y, invert = TRUE))

But you can also just use substr because you are only interested in the first character:

df %>% filter(substr(y, 1, 1) != 1)
talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    Thanks for the clarification! I erroneously assumed that regex would recognize which vector I wanted from the left side of the ==. – emehex Mar 04 '15 at 17:07
34

With a combination of dplyrand stringr (to stay within the tidyverse), you could do :

df %>% filter(!str_detect(y, "^1"))

This works because str_detect returns a logical vector.

Omar
  • 575
  • 5
  • 14
  • 4
    `str_detect` also has a `negate` argument, so you could use `str_detect(y, "^1", negate=T)` instead of `!str_detect(y, "^1")` – filups21 Apr 03 '20 at 21:05