Regular expressions (RegEx) and dplyr::filter()

Question

I have a simple data frame that looks like this:

x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
df = data.frame(x, y)    

    x   y
1   aa  101
2   aa  102
3   aa  113
4   bb  201
5   cc  202
6   cc  344
7   cc  407

I would like to use a dplyr::filter() and a RegEx to filter out all the y observations that start with the number 1

I'm imagining that the code will look something like this:

df %>%
  filter(y != grep("^1"))

But I am getting an Error in grep("^1") : argument "x" is missing, with no default

talat · Accepted Answer · 2015-03-04T16:55:58.740

57

You need to double check the documentations for grepl and filter.

For grep/grepl you have to also supply the vector that you want to check in (y in this case) and filter takes a logical vector (i.e. you need to use grepl). If you want to supply an index vector (from grep) you can use slice instead.

df %>% filter(!grepl("^1", y))

Or with an index derived from grep:

df %>% slice(grep("^1", y, invert = TRUE))

But you can also just use substr because you are only interested in the first character:

df %>% filter(substr(y, 1, 1) != 1)

edited Mar 04 '15 at 16:55

answered Mar 04 '15 at 16:50

talat

68,970
21
126
157

1

Thanks for the clarification! I erroneously assumed that regex would recognize which vector I wanted from the left side of the ==. – emehex Mar 04 '15 at 17:07

score 34 · Answer 2 · answered Feb 28 '18 at 22:03

34

With a combination of dplyrand stringr (to stay within the tidyverse), you could do :

df %>% filter(!str_detect(y, "^1"))

This works because str_detect returns a logical vector.

answered Feb 28 '18 at 22:03

Omar

575
5
14

4

`str_detect` also has a `negate` argument, so you could use `str_detect(y, "^1", negate=T)` instead of `!str_detect(y, "^1")` – filups21 Apr 03 '20 at 21:05

Regular expressions (RegEx) and dplyr::filter()

2 Answers2

Linked

Related