3

I am trying to extract some data from XML using regular expression in VBA, by matching start opening and closing tag of an element, but I am not getting anything.

I can use <foo>.+?<\/foo> in Notepad++, but its not working in VBA with Microsoft Regular Expression 5.5

<foo>
variable data here 
-
-
</foo>
farooq
  • 170
  • 1
  • 1
  • 8

2 Answers2

1

This is because of the . does not include newline in VBA. You can use (.|\n)* in your pattern definition to include the newline \n For your example <foo>(.|\n)*<\/foo> You could also use <foo>[^<]*<\/foo> if between the foo block you don't expect a <

Thierry Dalon
  • 779
  • 5
  • 21
0

This is an example to list all the <td> contents:

Sub MatchXMLtags()
  Dim xml As String
  xml = "<td>a</td><td>b" & vbCrLf & "</td><td>c</td>" & vbCrLf & "<td>d</td>"

  Dim match As Object
  With CreateObject("VBScript.RegExp")
    .pattern = "<td>\s*([\S\s]+?)\s*</td>"
    .Global = True
    .IgnoreCase = True
    .MultiLine = False

    ' display the content of each td tag
    For Each match In .Execute(xml)
      Debug.Print match.SubMatches(0)
    Next
  End With
End Sub
Florent B.
  • 41,537
  • 7
  • 86
  • 101
  • This works quite good for me in MS Word (the only regexp in many dozens I tested), but not exactly how I need: as a starting delimiter I have "##", and that's fine; but my end delimiter is: **§[text, text, text, text, ,,,]-(text, text, text,...)**, and I can't figure out how to find it; I can't find even the right regexp for just the end delimiter, my output is *all* the delimiters in the document! I tried with **§ \[(.*)\] - \((.*)\)** and many other variants with no success. – jumpjack Oct 14 '20 at 14:12