1

trying to get all the content within the h2 (to get the title of the article) in the div id=firehoselist but the following code only returns the first result. Any ideas please

    $crawler = new Crawler($content);

    $crawler->filterXPath('//div[@id="firehoselist"]//*')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->filter('h2')->text());

 });

content I'm trying to scrape is too messy to post here, but it is from the slashdot org website

GAV
  • 1,205
  • 2
  • 18
  • 38

1 Answers1

1

//div[@id="firehoselist"] is looking for every element which has the ID of firehoselist and will only get the first result of this entry $node->filter('h2')->text().

What you need is to get every #firehoselist h2 of the parsed html:

$crawler->filterXPath('//div[@id="firehoselist"]//h2')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->text());

 });
Azuloo
  • 461
  • 2
  • 9
  • Coolio thanks - also just needs an extra / to work like $crawler->filterXPath('//div[@id="firehoselist"]//h2')->each(function (Crawler $node) use (&$results) { – GAV Nov 29 '17 at 11:54
  • just occured to me I actually wanted to get other elements in loop at the same time not just H2 - trying different combinations but can't work it out – GAV Nov 29 '17 at 12:57
  • I suppose `'//div[@id="firehoselist"]//*` will get you all the elements of the container with this id. Did you try it? – Azuloo Nov 29 '17 at 13:23
  • That does get the content, but now I can't figure how to get to the H2 value - why not $node->filter('h2')->text(); returns error 'current node list is empty' – GAV Nov 29 '17 at 13:54
  • This error means you've got no `h2` element. Just print the insides to see what you've got. – Azuloo Nov 29 '17 at 14:05