0

I would like to filter out certain fields, if they do not match criteria. The problem is their sequence. I tried following constructions:

(EXCLUDING)(?!\(MONDAY)(.*MONDAY).*

and

(EXCLUDING)(?!\()(.*MONDAY).*

What I want to achieve is to find a filter than catches EXCLUDING * MONDAY but not if there is a bracket between these words. That is, I want to catch:

EXCLUDING MONDAY
EXCLUDING WEDNESDAY AND MONDAY
EXCLUDING MONDAY AND WEDNESDAY
EXCLUDING MONDAY (WEDNESDAY IS OK)

but not

EXCLUDING WEDNESDAY (MONDAY IS OK)

The expressions above of course catch all of them. It is to be run in R.

David C.
  • 1,974
  • 2
  • 19
  • 29
Slav
  • 469
  • 4
  • 18

2 Answers2

1

How's this?

mystrings <- c("EXCLUDING MONDAY",
"EXCLUDING WEDNESDAY AND MONDAY",
"EXCLUDING MONDAY AND WEDNESDAY",
"EXCLUDING MONDAY (WEDNESDAY IS OK)",
"EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING[^\\(]+MONDAY", mystrings)

> TRUE  TRUE  TRUE  TRUE FALSE
thc
  • 9,527
  • 1
  • 24
  • 39
0

If you just want to match a pattern where ( should not occur immediately before MONDAY you can use negative lookbehind assertion. Your regex was for negative lookahead, that's why it didn't work correctly for (MONDAY.

strs <- c("EXCLUDING MONDAY",
          "EXCLUDING WEDNESDAY AND MONDAY",
           "EXCLUDING MONDAY AND WEDNESDAY",
               "EXCLUDING MONDAY (WEDNESDAY IS OK)",
               "EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING.*(?<!\\()MONDAY", strs, perl=TRUE)
# [1]  TRUE  TRUE  TRUE  TRUE FALSE
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • 1
    thank you, i have to admit i have not tried the negative lookahead with < within R as when i entered it into the Regex online editor it claimed Java would not recognize it for some reason - as you could see for yourself http://regexr.com. I just assumed if Java did not, R is even more likely to reject – Slav Feb 09 '17 at 09:53