6

I'm using a regular expression in Excel VBA to parse the results of a swim meet. The code reads a row of text that was copied from a PDF and outputs the important data into individual cells. Since the format of the string varies throughout the source PDF, the regular expression is quite complicated. Still, I'm able to parse 95% of the data at this point.

Some of the rows that are not being parsed are confusing me, though. VBA is clearly not able to find a match with the regular expression, but when I copy the exact same regex and string into this website, JavaScript is able to find a match without a problem. Is there something different in the way VBA and JavaScript handle regular expressions that might account for this?

Here's the string that VBA refuses to match:

12. NUNEZ CHENG, Walter 74 Club Tennis Las Terr 3:44.57 123

Here's the function I'm using in Excel (mostly successfully):

Function singleLineResults(SourceString As String) As Variant
    Dim cSubmatches As Variant
    Dim collectionArray(11) As String
    Dim cnt As Integer
    Dim oMatches As MatchCollection

    With New RegExp
        .MultiLine = MultiLine
        .IgnoreCase = IgnoreCase
        .Global = False

        '1. JAROSOVA, Lenka 27 Swimmpower Prague 2:26.65 605 34.45 37.70 37.79 36.71
        .Pattern = "(\d*)\.?\s?([^,]+),\s([^\d]+)\s?(\d+)\s((?:[A-Z]{3})?)\s?((?:(?!\d\:\d).)*)\s?((?:\d+:)?\d+\.\d+)(?:\s(\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:Splash Meet Manager 11, Build \d{5} Registered to [\w\s]+ 2014-08-\d+ \d+:\d+ - Page \d+)?$"

        Set oMatches = .Execute(SourceString)
        If oMatches.Count > 0 Then
            For Each submatch In oMatches(0).SubMatches
                collectionArray(cnt) = submatch '.Value
                cnt = cnt + 1
            Next
        Else
            singleLineResults = Null
        End If
    End With

    singleLineResults = collectionArray()
End Function
Community
  • 1
  • 1
carpiediem
  • 1,918
  • 22
  • 41
  • I also tried you whooper of a regex and got several matches ( from all the capture groups ) [here](http://regex101.com/r/lV7cN9/1). While it does match the string you have shown here does the real data perhaps have some whitespace characters in the beginning or end of string that you cannot see? – Matt Aug 22 '14 at 03:15
  • I checked for whitespace issues early on and didn't find anything. The specific string that I posted above definitely did not include any leading or trailing whitespace. – carpiediem Oct 26 '14 at 11:41
  • Are there any accented characters in the non-matching lines? Some character sets have more than one type of space. The non-breaking-space (often encoded in URLs as " ") is one. I would examine the character codes of everything in a line that does not match. – AdrianHHH Nov 06 '14 at 11:49
  • "12. NUNEZ CHENG, Walter 74 Club Tennis Las Terr 3:44.57 123" is not matching and it doesn't have any special characters or  . Although, coincidentally, it would have a special characters if Núñez had been spelled correctly. – carpiediem Nov 10 '14 at 15:24

1 Answers1

1

Could you add more examples to what actually matches? E.g. the surrounding lines that matches, and better yet, examples that are not supposed to match if any?

I've tried "cleaning" up a bit in the regex, removing groups that are not used to match that particular line, to make the error more obvious, and changed how one of the groups works, which might actually fix the issue:

(\d*)
\.?\s?
([^,]+)
,\s
([^\d]+)
\s?
(\d+)
\s
(
  (?:[A-Z]{3})?
)
\s?
(
# OLD SOLUTION
#  (?:
#    (?!\d\:\d)
#    .
#  )*

# NEW SOLUTION
  .*?
)
\s?
(
  (?:\d+:)?
  \d+\.\d+
)
(?:
  \s
  (\d+)
)?
$

See example on regex101.

The group that puzzles me the most, however, is this one:

(?:[A-Z]{3})?

Why the 3 character limit, when it only matches the first 3 letters in the street name?

Johny Skovdal
  • 2,038
  • 1
  • 20
  • 36