3

I would like to split a string into an array according to a regular expression similar to what can be done with preg_split in PHP or VBScript Split function but with a regex in place of delimiter.

Using VBScript Regexp object, I can execute a regex but it returns the matches (so I get a collection of my splitters... that's not what I want)

Is there a way to do so ?

Thank you

MaxiWheat
  • 6,133
  • 6
  • 47
  • 76

4 Answers4

5

If you can reserve a special delimiter string, i.e. a string that you can choose that will never be a part of the real input string (perhaps something like "#@#"), then you can use regex replacement to replace all matches of your pattern to "#@#", and then split on "#@#".

Another possibility is to use a capturing group. If your delimiter regex is, say, \d+, then you search for (.*?)\d+, and then extract what the group captured in each match (see before and after on rubular.com).

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • Nice. What would be the Regexp for the sample string 'james007bond123hello42world' to be split at each change from letters (d+) to numbers (D+)? The envisioned result: $1='james' $2='007' $3='bond' $4='123' $6='hello' $7='42' 'world' Is that possible? – snahl May 23 '22 at 01:41
  • Here is a very promising approach, but it leaves out the special characters like '_' or '.': https://rubular.com/r/Q92iW17MALZPD7 – snahl May 23 '22 at 02:11
0

You can alway use the returned array of matches as input to the split function. You split the original string using the first match - the first part of the string is the first split, then split the remainder of the string (minus the first part and the first match)... continue until done.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • 1
    If I want to split a multiple-line String into separate variables, using an array of matching line breaks `"\n"` probably won't work. Seems like it would just look for the string `\n` instead of looking for line breaks, correct? – Michael Innes Feb 06 '13 at 22:34
0

I wrote this for my use. Might be what you're looking for.

Function RegSplit(szPattern, szStr)
Dim oAl, oRe, oMatches
Set oRe = New RegExp
oRe.Pattern = "^(.*)(" & szPattern & ")(.*)$"
oRe.IgnoreCase = True
oRe.Global = True
Set oAl = CreateObject("System.Collections.ArrayList")

Do
    Set oMatches = oRe.Execute(szStr)
    If oMatches.Count > 0 Then
        oAl.Add oMatches(0).SubMatches(2)
        szStr = oMatches(0).SubMatches(0)
    Else
        oAl.Add szStr
        Exit Do
    End If  
Loop
oAl.Reverse
RegSplit = oAl.ToArray
End Function
'**************************************************************
Dim A
A = RegSplit("[,|;|#]", "bob,;joe;tony#bill")
WScript.Echo Join(A, vbCrLf)

Returns:
bob

joe
tony
bill
  • This doesn't seem to work properly with a pattern link `\s+` with a string with multiple matches. – NetMage Sep 09 '21 at 19:56
0

I think you can achieve this by using Execute to match on the required splitter string, but capturing all the preceding characters (after the previous match) as a group. Here is some code that could do what you want.

'// Function splits a string on matches
'// against a given string
Function SplitText(strInput,sFind)
    Dim ArrOut()


    '// Don't do anything if no string to be found
    If len(sFind) = 0 then
        redim ArrOut(0)
        ArrOut(0) = strInput
        SplitText = ArrOut
        Exit Function
    end If

    '// Define regexp
    Dim re
    Set re = New RegExp 

    '// Pattern to be found - i.e. the given
    '// match or the end of the string, preceded
    '// by any number of characters
    re.Pattern="(.*?)(?:" & sFind & "|$)" 
    re.IgnoreCase = True 
    re.Global = True

    '// find all the matches >> match collection
    Dim oMatches: Set oMatches = re.Execute( strInput )

    '// Prepare to process
    Dim oMatch
    Dim ix
    Dim iMax

    '// Initialize the output array
    iMax = oMatches.Count - 1
    redim arrOut( iMax)

    '// Process each match 
    For ix = 0 to iMax

        '// get the match
        Set oMatch = oMatches(ix)


        '// Get the captured string that precedes the match
        arrOut( ix ) = oMatch.SubMatches(0)

    Next

    Set re = nothing

    '// Check if the last entry was empty - this
    '// removes one entry if the string ended on a match
    if arrOut(iMax) = "" then Redim Preserve ArrOut(iMax-1)

    '// Return the processed output
    SplitText = arrOut

End Function
JohnRC
  • 1,251
  • 1
  • 11
  • 12
  • I see this is actually a duplicate of the second suggestion in @polygenelubricants answer which has already been accepted. – JohnRC Aug 30 '19 at 17:58