Can only get one result Dom Crawler

Question

trying to get all the content within the h2 (to get the title of the article) in the div id=firehoselist but the following code only returns the first result. Any ideas please

    $crawler = new Crawler($content);

    $crawler->filterXPath('//div[@id="firehoselist"]//*')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->filter('h2')->text());

 });

content I'm trying to scrape is too messy to post here, but it is from the slashdot org website

Azuloo · Accepted Answer · 2017-11-29T11:55:28.913

1

//div[@id="firehoselist"] is looking for every element which has the ID of firehoselist and will only get the first result of this entry $node->filter('h2')->text().

What you need is to get every #firehoselist h2 of the parsed html:

$crawler->filterXPath('//div[@id="firehoselist"]//h2')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->text());

 });

edited Nov 29 '17 at 11:55

answered Nov 29 '17 at 11:29

Azuloo

461
2
9

Coolio thanks - also just needs an extra / to work like $crawler->filterXPath('//div[@id="firehoselist"]//h2')->each(function (Crawler $node) use (&$results) { – GAV Nov 29 '17 at 11:54
just occured to me I actually wanted to get other elements in loop at the same time not just H2 - trying different combinations but can't work it out – GAV Nov 29 '17 at 12:57
I suppose `'//div[@id="firehoselist"]//*` will get you all the elements of the container with this id. Did you try it? – Azuloo Nov 29 '17 at 13:23
That does get the content, but now I can't figure how to get to the H2 value - why not $node->filter('h2')->text(); returns error 'current node list is empty' – GAV Nov 29 '17 at 13:54
This error means you've got no `h2` element. Just print the insides to see what you've got. – Azuloo Nov 29 '17 at 14:05

Can only get one result Dom Crawler

1 Answers1