0

I need to get the content of the first p tag in a string (but without the actual tags).

Example:

<h1>I don't want the title</h1>
<p>This is the text I want</p>
<p>I don't want this</p>
<p>I also don't want this</p>

I guess I need to finde everything else and replace it with nothing? But how do I create the regex?

EmFi
  • 23,435
  • 3
  • 57
  • 68
Peter Schrøder
  • 494
  • 1
  • 4
  • 22

3 Answers3

1

Try something like this:

Set fso  = CreateObject("Scripting.FileSystemObject")
Set html = CreateObject("HTMLFile")
html.write fso.OpenTextFile("C:\path\to\your.html").ReadAll
Set p = html.getElementsByTagName("p")
WScript.Echo p(0).innerText
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
1

use this pattern to capture what you want

^[\s\S]*?<p>([^<>]*?)<\/p>  

Demo

^               # Start of string/line
[\s\S]          # Character Class [\s\S]
*?              # (zero or more)(lazy)
<p>             # "<p>"
(               # Capturing Group (1)
  [^<>]         # Character not in [^<>]
  *?            # (zero or more)(lazy)
)               # End of Capturing Group (1)
<\/p>           # "<\/p>"

or use this pattern to match everything else and replace with nothing

^[\s\S]*?<p>|<\/p>[\s\S]*$

Demo

^               # Start of string/line
[\s\S]          # Character Class [\s\S]
*?              # (zero or more)(lazy)
<p>             # "<p>"
|               # OR
<               # "<"
\/              # "/"
p>              # "p>"
[\s\S]          # Character Class [\s\S]
*               # (zero or more)(greedy)
$               # End of string/line
alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0

You can do it properly with a expression :

//p[1]/text()

Adapted from Navigating XML nodes in VBScript, for a Dummy :

Set objDoc = CreateObject("MSXML.DOMDocument")
objDoc.Load "C:\Temp\Test.xml"

' Find a particular element using XPath:

Set objNode = objDoc.selectSingleNode("//p[1]/text()")
MsgBox objNode.getAttribute("value")
Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223