1

I am looking for a way to apply two different logical conditions (an inclusion and an exclusion statement) to a string, and obtain a logical vector as output:

I was able to do it with the following code:

library(purrr)
library(stringr)

fruits<-c('apple', 'banana', NA, 'orange and apple')

conditions<-list(detect=function(x)str_detect(x,'apple'),
                 exclude=function(x)str_detect(x,'orange', negate=TRUE))

Solution 1:

map_lgl(fruits, ~c(conditions[[1]](.) & conditions[[2]](.)))
>[1]  TRUE FALSE    NA FALSE

Solution 2:

Reduce("&", map(conditions, ~.(fruits)))
>[1]  TRUE FALSE    NA FALSE

This is obviously quite verbose, because I had to define and call these two functions, then use two loops (map() and Reduce()).

I wonder if:
-There is a simpler way to call these two functions to create the final vector using some sort of purrr-like synthax, in a single call.

I tried

I tried to use `fruits%>%str_detect(., 'apple') & str_detect(., 'orange, negate=TRUE)

But failed, got an "òbject '.' not found" statement

-There is a simpler regex/stringr solution that would avoid calling two different str_detect functions

Suggestions?

GuedesBF
  • 8,409
  • 5
  • 19
  • 37

2 Answers2

2

You could do this using grepl along with a single regular expression:

fruits<-c('apple', 'banana', NA, 'orange and apple')
grepl("^(?!.*\\borange\\b).*\\bapple\\b.*$", fruits, perl=TRUE)

[1]  TRUE FALSE FALSE FALSE

However, I would probably just and together two separate calls to grepl here:

grepl("\\bapple\\b", fruits) & !grepl("\\borange\\b", fruits)
[1]  TRUE FALSE FALSE FALSE
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Hi, Tim, Thanks. Please see my comment in Ronak Shahs answer. – GuedesBF Apr 24 '21 at 21:41
  • I liked the regex solution. I feel exclusion/negation inside regex to be confusing more then often, yet sometimes it is nice to have this option. Please see that your answer transformed NAs into FALSE, is there a way to fix it? – GuedesBF Apr 24 '21 at 21:43
  • 1
    You could use: `ifelse(is.na(fruits), NA, grepl("\\bapple\\b", fruits) & !grepl("\\borange\\b", fruits))` – Tim Biegeleisen Apr 24 '21 at 23:43
1

The way you are storing the conditions make the loop (map or Reduce) necessary. Why are you storing it in a list? These are vectorized functions and can be applied in a vectorized way.

library(stringr)
str_detect(fruits, 'apple') & str_detect(fruits, 'orange', negate = TRUE)
#[1]  TRUE FALSE    NA FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I wanted a general approach, which could get nasty with multiple conditions. I could call all functions in-line. This list approach was so I understand the structure of the code a little better. I may also use this structure in many places in the same script, I am looking for consistency. I was also trying to start with the data and a pipe, something like fruits %>%..., that is the reason for the purrr suggestion. Thanks for the patience – GuedesBF Apr 24 '21 at 21:40
  • 1
    You can't really pipe `str_detect` output like this `fruits %>% str_detect('apple') %>% str_detect('orange', negate = TRUE)` since output from first pipe (`str_detect('apple') `) is `TRUE`/`FALSE` value whereas what we need is actual values i.e `fruit`. Using `str_subset` works but I am not sure if that is what you want. `fruits %>% str_subset('apple') %>% str_subset('orange', negate = TRUE)` – Ronak Shah Apr 25 '21 at 01:45