Whats the most efficient/nicest way to extract a text value from a HTML tag using Symfony DOM Crawler?

Question

Given the following HTML code snippet:

<div class="item">
  large
  <span class="some-class">size</span>
</div>

I'm looking for the best way to extract the string "large" using Symfony's Crawler.

$crawler = new Crawler($html);

Here I could use $crawler->html() then apply a regex search. Is there a better solution? Or how would you do it exactly?

haxpanel · Accepted Answer · 2015-11-18T19:08:06.330

4

I've just found a solution that looks the cleanest to me:

$crawler = new Crawler($html);
$result = $crawler->filterXPath('//text()')->text();

edited Nov 18 '15 at 19:08

answered Nov 18 '15 at 15:23

haxpanel

4,402
4
43
71

1

`$result = $crawler->filterXPath('//div[@class="item"]/text()')->text();` would be better. – COil Nov 18 '15 at 15:38
I think we actually don't need this extra selector as the div.item node has already been selected because thats the root node – haxpanel Nov 18 '15 at 19:10
But you will never have to handle this sole html snippet, I suppose it may be used when retrieve a large a full html source. – COil Nov 19 '15 at 08:21
I'm using css selectors, I'm forced to use xpath just at the end somehow like this: $crawler->filter('div.item')->filterXPath('//text()')->text(); – haxpanel Nov 19 '15 at 08:45

COil · Answer 2 · 2015-11-18T15:39:05.910

0

$crawler = new Crawler($html);
$node = $crawler->filterXPath('//div[@class="item"]');
$domElement = $node->getNode(0);
foreach ($node->children() as $child) {
    $domElement->removeChild($child);
}
dump($node->text()); die();

After you have to trim whitespace.

edited Nov 18 '15 at 15:39

answered Nov 18 '15 at 15:21

COil

7,201
2
50
98

score 0 · Answer 3 · answered Nov 18 '15 at 15:27

This is a bit tricky as the text that you're trying to get is a text node that the DOMCrawler component doesn't (as far as I know) allow you to extract. Thankfully DOMCrawler is just a layer over the top of PHP's DOM classes which means you could probably do something like:

$crawler = new Crawler($html);
$crawler = $crawler->filterXPath('//div[@class="item"]');
$domNode = $crawler->getNode(0);
$text = null;

foreach ($domNode->children as $domChild) {
    if ($domChild instanceof \DOMText) {
        $text = $domChild->wholeText;
        break;
    }
}

This wouldn't help with HTML like:

<div>
    text
    <span>hello</span>
    other text
</div>

So you would only get "text", not "text other text" in this instance. Take a look at the DOMText documentation for more details.

Whats the most efficient/nicest way to extract a text value from a HTML tag using Symfony DOM Crawler?

3 Answers3