0

I need a regex that can match on an incorrect AND / OR logic statements but not if they are in quotes for example:

No matches should be found in:

MAR AND SATURN
MAR OR SATURN
"MAR AND SATURN"

There won't be any matches if AND or OR have at least 1 white space character plus 1 non-white space character on both sides and the characters are not made up of OR or AND. So for example ..R AND S.. should not match but (OR) OR (OR) or (AND) AND (AND) should.

Matches

  MARS AND SATURN [AND]
  MARS [OR]
  MARS [ OR ]
  [AND] AND [AND]
  [OR] [AND]
  [OR] [AND]
  [AND] [OR]
  [ AND ] [ OR ]

You will notice some examples contain white spaces before, after or on both sides of the AND or OR operator, these also need to match.

I'm using the .NET framework and this is what I came up with which works. However, it seems too complicated! There has to be a way to simplify it.

(?!.*\"")(?<!(?:\bAND\b\s|\bOR\b\s))(?:\b(?:AND|OR)\b)(?=\s\b(?:AND|OR)\b)|(?<=\bAND\b\s|\bOR\b\s)(?:\b(?:AND|OR)\b)(?!\s\b(?:AND|OR)\b)|^\b(?:AND|OR)\b|(?:AND\s?|OR\s?)$|(?<=\()\s?(?:\bAND\b|\bOR\b)|(?<=\()(?:\bOR|\bAND)(?=\))|(?:\bOR|\bAND)(?=\))(?!.*\"")
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Imran Azad
  • 1,008
  • 2
  • 12
  • 30
  • Could you please format the input text appropriately and specify where/what should be matched? I reformatted the beginning. – Wiktor Stribiżew May 02 '16 at 09:43
  • Your first example isn't an example of the sentence it follows :-/ – Aaron May 02 '16 at 09:46
  • So, you want to match all cases where `AND` or `OR` appear as the first or last term? What if there is no `AND` or `OR` at all? – tobias_k May 02 '16 at 09:50
  • @Aaron Thanks I've explained in a bit more detail what I mean, when I wrote 1 non-whitespace character on either side I meant including the white space that precedes it. – Imran Azad May 02 '16 at 10:03
  • @tobias_k Not necessarily, you could have MARS AND SATURN OR OR PLUTO AND VENUS, in this instance the first OR should match as it is mismatched, the left hand side proposition proves true but not the left hand side. – Imran Azad May 02 '16 at 10:11
  • @WiktorStribiżew So close! However it also should match on (AND) AND (AND) or (OR) OR (OR) - the brackets are not literal just an indicator of a match, please see my updated question. – Imran Azad May 02 '16 at 10:16
  • @WiktorStribiżew I noticed your solution makes use of a negative lookahead that specifies a word boundary that doesn't follow a non white space character and a space however it also needs to match on MARS [ OR ] – Imran Azad May 02 '16 at 10:19

1 Answers1

0

I think this will do:

^ *'[^']*' *$|^ *"[^"]*" *$|(\b(AND|OR)\b) +(?1)|(?1)\s*$|^\s*(?1)

Demo: https://regex101.com/r/nD9yR3/2

Explanation:

This regex is to match the wrong string!!!

  1. (?1) is for recursive regex. It repeats regex of group 1.
  2. ^ *'[^']*' *$|^ *"[^"]*" *$| is for ignoring string inside quotes. It's considered a match if it has value for group 1, not group zero.
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108