6

Code:

Match match = Regex.Match("abc", "(?(x)bx)");
Console.WriteLine("Success: {0}", match.Success);
Console.WriteLine("Value: \"{0}\"", match.Value);
Console.WriteLine("Index: {0}", match.Index);

Output:

Success: True
Value: ""
Index: 1

It seems that a conditional group without an "else" expression will instead create a lookahead from the first character of the "if" expression and use that as the "else". In this case it would run as if the regex was (?(x)bx|(?=b))

What the **** is going on here? Is this intentional? It doesn't seem to be documented.

Edit: An issue has been created in the corefx repository: https://github.com/dotnet/corefx/issues/26787

Kendall Frey
  • 43,130
  • 20
  • 110
  • 148
  • It's a bug .. .. `(?=x)bx` will never match, conditionally it could never be true. –  Feb 02 '18 at 02:39
  • As the docs, there has to be a `no` clause, and evaluation expression has to be part of the `yes` clause, but the code here is non-conformant, so you cannot anything solid. I tried to check various conditions, and could not get a proper pattern of how this non-conformant code works. This is the same as what Damien mentions in the answer. – Ghasan غسان Feb 02 '18 at 08:26
  • 1
    But again, the question is also, why does not the Library throw exception because of this non-conformant expression, like when you miss a bracket or so. – Ghasan غسان Feb 02 '18 at 08:27
  • Just an added note here. I would _not_ recommend letting Dot-Net engine decide if it is an _expressional_ condition or a capture condition. Always be explicit `(?()` or `(?(?!expression))`. –  Feb 02 '18 at 17:09
  • Also, this is not just a problem with expression conditionals. The same behavior is with _named/numbered_ group conditionals. [(?(1)ab)(.)](http://regexstorm.net/tester?p=%28%3f%281%29ab%29%28.%29&i=ab) as opposed to the correct behavior (anticipated) [(?(1)b|)(.)](http://regexstorm.net/tester?p=%28%3f%281%29b%7c%29%28.%29&i=xy) –  Feb 02 '18 at 17:17
  • @sln .NET does not support `(?()...)` syntax. Also, implicit and explicit lookaheads have subtle differences, so there may be good situations for one over the other. – Kendall Frey Feb 02 '18 at 19:07
  • Yeah, I don't know why .NET would have ambiguous syntax like that. Just have to wait and see if they do anything from that ticket. –  Feb 03 '18 at 00:09
  • @KendallFrey instead of `(?()..)` you could use something like `(?(?<=\k)...)` instead (not that readable, though) – Wolfgang Kluge Feb 07 '18 at 16:49
  • @WolfgangKluge That means something entirely different. It needs to actually match the capture, rather than just seeing if there was a capture. – Kendall Frey Feb 07 '18 at 21:00
  • @KendallFrey you're right, sorry – Wolfgang Kluge Feb 08 '18 at 11:33

1 Answers1

3

I think it may be a mis-optimization. As Alternation Constructs in Regular Expressions points out:

Because the regular expression engine interprets expression as an anchor (a zero-width assertion), expression must either be a zero-width assertion (for more information, see Anchors) or a subexpression that is also contained in yes.

Your expression value satisfies neither of these constraints. I suspect some form of optimization where, since the expression isn't zero-width the input is advanced until the yes can potentially be satisfied (since that's the only pattern you've given the regex engine to work with)

As pointed out in the comments, since your expression isn't also contained in yes, the pattern can never match and so it's unlikely too much concern would be raised about the mis-optimization.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Exactly my thoughts. (+1) – Ghasan غسان Feb 02 '18 at 08:28
  • This bug has absolutely nothing to do with anything you mention here. You've highlighted a sentence. All that says is if a non-numeric, named capture group is undefined _anywhere_, it is treated as a lookahead assertion, which can be partially matched _yes_, `(?(xxx).)`, or not at all. Also, the _same_ behavior is exhibited with a named capture `(?(t)(?!)[^rt])(?.)`. Also, advancing the match position has nothing to do with this. Finally, there has never been a requirement for the alternation, never. And what in the world does this `mis-optimization` mean ?? –  Feb 02 '18 at 17:00
  • @sln - by mis-optimization, I mean that they've applied some form of optimization based on assuming that they'll only ever be working with *satisfiable* conditions. I agree that it's a bug, but due to the non-satisfiability of the situation, one that's unlikely to be a high priority. – Damien_The_Unbeliever Feb 02 '18 at 17:50
  • `but due to the non-satisfiability of the situation, one that's unlikely to be a high priority` I extremely _doubt_ that muddled (bug) of conditional alternations will not be a high priority given the high usage of stack checking via conditionals (?(x)(?!)typo). It's more likely that MS won't change because it's just a company that sucks. Might as well be honest. After 15 years of MFC I can tell you flat out they suck !! –  Feb 02 '18 at 17:58