6

I have a text like this:

text = 'I love apple, pear, grape and peach'

If I want to know if the text contain either apple or pear. I can do the following and works fine:

str_detect(text,"apple|pear")
[1] TRUE

my question is what if I want to use boolean like this (apple OR pear) AND (grape). Is there anyway that I can put it in str_detect(). Is that possible? The following is NOT working:

str_detect(text,"(apple|pear) & (grape)" )
[1] FALSE

The reason I want to know this is I want to program to convert a 'boolean query' and feed into the grep or str_detect. something like:

str_detect(text, '(word1|word2) AND (word2|word3|word4) AND (word5|word6) AND .....')

The number of AND varies....

No solution with multiple str_detect please.

zesla
  • 11,155
  • 16
  • 82
  • 147

1 Answers1

10

You can pass all the patterns to str_detect as a vector and check that they're all TRUE with all.

patterns <- c('apple|pear', 'grape')
all(str_detect(text, patterns))

Or with base R

all(sapply(patterns, grepl, x = text))

Or, you could put the patterns in a list and use map, which would give more detailed output for the ORs (or anything else you may want to put as a list element)

patterns <- list(c('apple', 'pear'), 'peach')
patterns %>% 
  map(str_detect, string = text)

# [[1]]
# [1] TRUE TRUE
# 
# [[2]]
# [1] TRUE

It's also possible to write it as a single regular expression, but I see no reason to do this

patterns <- c('apple|pear', 'grape')
patt_combined <- paste(paste0('(?=.*', patterns, ')'), collapse = '')
str_detect(text, patt_combined)

patt_combined is

# [1] "(?=.*apple|pear)(?=.*grape)"
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • 1
    One thing that's really nice about the `map`-based method is that before calling `reduce`, you can see exactly which strings matched which patterns. This is useful when there are more strings to test – camille Sep 17 '19 at 19:20
  • I actually wrote ^^that comment right before you changed from `map` to `map_lgl`. `map` + `reduce` lets you test on a vector of strings, which the current `map_lgl` + `all` version doesn't – camille Sep 17 '19 at 19:23
  • Not sure what you mean, I don't think there's any information lost in the output of `map_lgl` vs the output of `map` in this case, right? Neither one is named, it's just a list of logical values vs a logical vector, you can still see which patterns matched. – IceCreamToucan Sep 17 '19 at 19:24
  • 1
    I made a vector of 3 strings to test on. If I call `map_lgl(patt, ~str_detect(text, .))` I get an error because `map_lgl` needs to return a single value, but is instead trying to return 3. That could be solved by putting `all` *inside* the `map_lgl` call. I think both ways are fine (and both worth keeping in the answer IMO) – camille Sep 17 '19 at 19:29
  • I see, I didn't even realize the str_detect function was vectorized in the `pattern` argument, I was thinking it was like `grepl`. Put the `map` option back in, thanks – IceCreamToucan Sep 17 '19 at 19:54
  • I used patt_combined to look for a vector of words in a data frame, line by line, within a for loop on the first vector! Thanks for it – Dario Lacan Mar 02 '22 at 15:39