0

When using alternation in regex, we should include items in the alternators in order to avoid being affected by eagerness of the engine.

Then if having a list such as co,co.,co-op,association,assoc we should prefer to include them in order to get the most precise match. Then, this should be changed to association,assoc,co-op,co.,co.

I have a basic regex pattern to split a word in two if hyphen or slash is included, so I get just the part before the hyphen or slash:

(.*(?<!\w)(CO-OP|CO|CO.)(?!\w).*)[-/](\s*\w+.*)

However, this regex is breaking incorrectly when providing ABC CO-OP ELEMENTARY SCHOOL. This string is becoming just ABC CO. However, if I remove CO from the alternators, the string is returned in its original form ABC CO-OP ELEMENTARY SCHOOL which is correct. In addition, the string ARMSTRONG CO-OP ELEMENTARY SCHOOL / ECOLE PRIMAIRE ARMSTRONG COOPERATIVE should be broken to become ARMSTRONG CO-OP ELEMENTARY SCHOOL without the string after slash.

Why CO is matched in the alternators and used to break the string?

John Barton
  • 1,581
  • 4
  • 25
  • 51
  • `CO.` should be `CO[.]` or `CO\.` but that's not your current issue. You could use control verbs to skip the `co,co.,co-op` matches. – user3783243 May 26 '20 at 02:08

1 Answers1

2

Your issue is that your regex requires there to be a - or a \ in the string, so it is forcing ABC CO-OP ELEMENTARY SCHOOL to split on the - in CO-OP. If you:

  1. make the second part of the regex optional;
  2. change the .* at the end of the first group to be lazy (.*?); and
  3. add start and end-of-string anchors

you will get the results you want:

^(.*(?<!\w)(?:CO-OP|CO|CO\.)(?!\w).*?)(?:[-/](\s*\w+.*))?$

Demo on regex101

Note also that the . in CO. should be escaped.

Nick
  • 138,499
  • 22
  • 57
  • 95