0

I am having a hard time coming up with a regex to match a specific case:

This can be matched: any-dashed-strings this-can-be-matched-even-though-its-big

This cannot be matched: strings starting with elem- or asdf- or a single - elem-this-cannot-be-matched asdf-this-cannot-be-matched -

So far what I came up with is:

/\b(?!elem-|asdf-)([\w\-]+)\b/

But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found

I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).

If possible, please elaborate on the solution. Thanks a lot!

ghaschel
  • 1,313
  • 3
  • 20
  • 41

1 Answers1

1

If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.

(?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b

Explanation

  • (?<!\S) Assert a whitespace boundary to the left
  • (?! Negative lookahead, assert the directly to the right is not
    • (?:elem|asdf)?- Optionally match elem or asdf followed by -
  • ) Close the lookahead
  • [\w-]+ Match 1+ word chars or -
  • \b A word boundary

See a regex demo.

Or a version with a capture group and without a lookbehind:

(?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b

See another regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70