I'm trying to page scrape some DOM that looks like this:
<span>text</span>
and sometimes looks like this:
<span><p>text</p></span>
However, I just can't seem to figure out how to get text
in the second scenario. I've tried several methods, and here's what I thought should work below:
def html = slurper.parse(reader)
Collection<NodeChild> nodes = html.'**'.findAll { it.name() == 'span' && it.@class == 'style2' }
...
def descriptionNode = html.'**'.find { it.name() == 'span' && it.@class == 'style20' }
def innerNode = descriptionNode.'**'.find { it.name() == 'p' }
def description
if (innerNode?.size() > 0)
{
description = innerNode.text()
}
else
{
description = descriptionNode.text()
}
Any idea how I need to go about using xmlslurper to get the behavior I need?