1

I have a regular expression with named capture groups, where the last group is optional. I can't figure out how to iterate the groups and properly deal with the optional group when it's empty; I get an EListOutOfBounds exception.

The regular expression is parsing a file generated by an external system that we receive by email which contains information about checks that have been issued to vendors. The file is pipe-delimited; a sample is in the code below.

program Project1;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.RegularExpressions, System.RegularExpressionsCore;
{
  File format (pipe-delimited): 
   Check #|Batch|CheckDate|System|Vendor#|VendorName|CheckAmount|Cancelled (if voided - optional)
}
const 
  CheckFile = '201|3001|12/01/2015|1|001|JOHN SMITH|123.45|'#13 +
              '202|3001|12/01/2015|1|002|FRED JONES|234.56|'#13 +
              '103|2099|11/15/2015|2|001|JOHN SMITH|97.95|C'#13 ;

var
  RegEx: TRegEx;
  MatchResult: TMatch;
begin
  try
    RegEx := TRegEx.Create(
      '^(?<Check>\d+)\|'#10 +
      '  (?<Batch>\d{3,4})\|'#10 +
      '  (?<ChkDate>\d{2}\/\d{2}\/\d{4})\|'#10 +
      '  (?<System>[1-3])\|'#10 +
      '  (?<PayID>[0-9X]+)\|'#10 +
      '  (?<Payee>[^|]+)\|'#10 +
      '  (?<Amount>\d+\.\d+)\|'#10 +
      '(?<Cancelled>C)?$',
      [roIgnorePatternSpace, roMultiLine]);
    MatchResult := RegEx.Match(CheckFile);
    while MatchResult.Success do
    begin
      WriteLn('Check: ', MatchResult.Groups['Check'].Value);
      WriteLn('Dated: ', MatchResult.Groups['ChkDate'].Value);
      WriteLn('Amount: ', MatchResult.Groups['Amount'].Value);
      WriteLn('Payee: ', MatchResult.Groups['Payee'].Value);
      // Problem is here, where Cancelled is optional and doesn't 
      // exist (first two lines of sample CheckFile.)
      // Raises ERegularExpressionError 
      // with message 'Index out of bounds (8)' exception.
      WriteLn('Cancelled: ', MatchResult.Groups['Cancelled'].Value);
      WriteLn('');
      MatchResult := MatchResult.NextMatch;
    end;
    ReadLn;
  except
    // Regular expression syntax error.
    on E: ERegularExpressionError do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

I've tried checking to see if the MatchResult.Groups['Cancelled'].Index is less than MatchResult.Groups.Count, tried checking the MatchResult.Groups['Cancelled'].Length > 0, and checking to see if MatchResult.Groups['Cancelled'].Value <> '' with no success.

How do I correctly deal with the optional capture group Cancelled when there is no match for that group?

Ken White
  • 123,280
  • 14
  • 225
  • 444
  • 1
    Wow. Can whoever downvoted explain what I missed here? I thought the question was pretty clear, and the code is a full MCVE that compiles, runs, and reproduces the issue. – Ken White Dec 23 '15 at 01:12

2 Answers2

7

If the requested named group does not exist in the result, an ERegularExpressionError exception is raised. This is by design (though the wording of the exception message is misleading). If you move your ReadLn() after your try/except block, you would see the exception message in your console window before your process exits. Your code is not waiting for user input when an exception is raised.

Since your other groups are not optional, you can simply test if MatchResult.Groups.Count is large enough to hold the Cancelled group (the string that was tested is in the group at index 0, so it is included in the Count):

if MatchResult.Groups.Count > 8 then
  WriteLn('Cancelled: ', Write(MatchResult.Groups['Cancelled'].Value)
else
  WriteLn('Cancelled: ');

Or:

Write('Cancelled: ');
if MatchResult.Groups.Count > 8 then
  Write(MatchResult.Groups['Cancelled'].Value);
WriteLn('');

BTW, your loop is also missing a call to NextMatch(), so your code is getting stuck in an endless loop.

while MatchResult.Success do
begin
  ...
  MatchResult := MatchResult.NextMatch; // <-- add this
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Oops! When I distilled it down to a MCVE to post here, I must have missed the `NextMatch` line. I'll edit it in so it's not misleading. Will test this code as soon as I get back to my desk. Thanks, Remy. – Ken White Dec 22 '15 at 23:31
  • Perfect. The check for `MatchResult.Groups.Count > 8` works. I wasn't aware that the test string was in index 0. Thanks again, Remy. – Ken White Dec 23 '15 at 00:00
  • Neither did I when I first tried it, and it is not documented, either. I saw it in the debugger, though. When the `Cancelled` group is present, it is at index 8, not 7. – Remy Lebeau Dec 23 '15 at 00:18
6

You could also avoid using an optional group and make the cancelled-group obligatory, including either C or nothing. Just change the last line of the regex to

'(?<Cancelled>C|)$'

For your test application, this wouldn't change the output. If you need to work further with cancelled you can simply check if it contains C or an empty string.

if MatchResult.Groups['Cancelled'].Value = 'C' then
  DoSomething;
Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
  • Making it non-optional means it doesn't match the first two lines, unfortunately. I need to match all lines, because I'm processing the entire file; I need the *Cancelled* flag if it exists, but in either case I need the rest of each line as well. – Ken White Dec 23 '15 at 13:25
  • 1
    Have you changed the regex as I wrote? I only have Delphi XE3, but your test application outputs all three lines. – Sebastian Proske Dec 23 '15 at 13:39
  • I stand corrected. I missed the extra alternation operator (|) you added after the C, which allows for nothing to also match. My initial read only saw the removal of the optional (?). Your solution works also. I'm leaving Remy's answer as the accepted one, because his post directly answers the question I asked about handling the non-existent value directly in code. While yours also solves the problem, it requires a modification to the regular expression as well in order to work. I *did* upvote your answer. :-) – Ken White Dec 23 '15 at 16:50