1

I would like to know if there is a simpler way of matching for a sequence of words were at least one is mandatory.

In order to simplify things imagine that my words are A, B and C. I want to have a match for A, B, C, AB, BC, AC but for CA it would be C and A as separated matches, the order is important. Hence the first idea I came up with:

A?B?C?

The only problem I found is that it matches also empty string and I got invalid matches. What I want is some simpler way of doing:

(AB?C?|A?BC?|A?B?C)

That matches in ABCASDBC: ABC, A, BC. My real problem can have more words and that expression can grow and have an expensive calculation cost. That's my biggest concern (a different solution without using regular expressions is welcome as well)

OriolBG
  • 2,031
  • 2
  • 18
  • 21
  • Regex is platform dependant. What's yours? – Amit Jun 29 '15 at 10:31
  • I'm using C# in a windows based environment. But the question is more about the algorithm I would use for that, if your solution is in another language or there is some tool or library that searches it fast I welcome the answer ^^ – OriolBG Jun 29 '15 at 10:35
  • Other than empty strings, were there any problems with the "simple" approach? – Amit Jun 29 '15 at 10:35
  • No. Its just that the number of matches is misleading and might get the wrong idea meaning that there is some match when there is nothing really. – OriolBG Jun 29 '15 at 10:37
  • 1
    So do a preliminary test for empty strings. It's not worth complicating your regex for. – Amit Jun 29 '15 at 10:38
  • Thanks, already doing that, just that though that someone came up with a similar problem like that and couldn't find the question here. – OriolBG Jun 29 '15 at 10:43
  • 1
    In PCRE You can skip zero-length matches if you set PCRE_NOTEMPTY – D. Cichowski Jun 29 '15 at 11:47

1 Answers1

3
(?=[xyz])x?y?z?

Or if x, y, z are not characters:

(?=x|y|z)(?:x)?(?:y)?(?:z)?

The idea is to match the existence of either of the three using positive lookahead first, followed by the sequence for the order itself.

ndnenkov
  • 35,425
  • 9
  • 72
  • 104