Groovy XmlSlurper get value out of NodeChildren

Question

I'm parsing HTML and trying to get full / not parsed value out of one particular node.

HTML example:

<html>
    <body>
        <div>Hello <br> World <br> !</div>
        <div><object width="420" height="315"></object></div>
    </body>
</html>

Code:

def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)

println htmlParsed.body.div[0]

However it returns only text in case of first node and I get empty string for the second node. Question: how can I retrieve value of the first node such that I get:

Hello <br> World <br> !

Nick Grealy · Accepted Answer · 2015-04-08T07:04:36.990

This is what I used to get the content from the first div tag (omitting xml declaration and namespaces).

Groovy

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser
import groovy.xml.*

def html = """<html>
    <body>
        <div>Hello <br> World <br> !</div>
        <div><object width="420" height="315"></object></div>
    </body>
</html>"""

def parser = new Parser()
parser.setFeature('http://xml.org/sax/features/namespaces',false)
def root = new XmlSlurper(parser).parseText(html)
println new StreamingMarkupBuilder().bindNode(root.body.div[0]).toString()

Gives

<div>Hello <br clear='none'></br> World <br clear='none'></br> !</div>

N.B. Unless I'm mistaken, Tagsoup is adding the closing tags. If you literally want Hello <br> World <br> !, you might have to use a different library (maybe regex?).

I know it's including the div element in the output... is this a problem?

Yeah, I would like not to include the 'div'. If you can find a way, that would be great!!! — MeIr, Apr 08 '15 at 12:16
All I've got at the moment is ...`.toString().replaceAll(/^
|<\/div>$/, "")`. Not sure if there's a way using the StreamingMarkupBuilder. — Nick Grealy, Apr 08 '15 at 23:29

Groovy XmlSlurper get value out of NodeChildren

1 Answers1

Groovy

Gives