1

I'm using Symfony DomCrawler to get all text in a document.

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text
});

I'm trying to gather all text within the <body> that are outside of elements.

<body>
    This is an example
    <p>
        blablabla
    </p>
    another example
    <p>
        <span>Yo!</span>
        again, another piece of text <br/>
        with an annoy BR in the middle
    </p>
</body>

I'm using PHP Symfony and can use XPath (preferred) or RegEx.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Richard Fernandez
  • 558
  • 1
  • 6
  • 18
  • I don't know about Symfony DomCrawler, but XPath for getting text node that is directly within `` would be : `//body/text()` (some XPath processor doesn't support returning text node though) – har07 Jun 01 '16 at 13:23
  • Does this take into account text in nested elements? – Richard Fernandez Jun 01 '16 at 13:31
  • 1
    No. If you went them as well just add another `/` : `//body//text()` – har07 Jun 01 '16 at 13:33

1 Answers1

0

The string value of the entire document can be obtained with this simple XPath:

string(/)

All text nodes in the document would be:

//text()

The immediate text node children of body would be:

/body/text()

Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context.

kjhughes
  • 106,133
  • 27
  • 181
  • 240