0

I am using simplehtmldom to grab the html from a site. I then search for all the divs on the page and display the innertext where the word count is greater than 300. To do this I iterate with foreach.

$findDivs = $html->find('div');

foreach($findDivs as $findDiv) {
  $wordCount = explode(' ', $findDiv->outertext);
  $wordCount = count($wordCount);
  if($wordCount <= 300) {
    $findDiv->outertext = '';
   }
   else {
     echo $findDiv->outertext . '<br />';
  }
}

The problem that I have is that the results are duplicated 6 times. I can only assume that it is because all the divs are looped over for each iteration. However, I am not certain what technique I could use to ensure that each div is only assessed once.

user1882752
  • 576
  • 1
  • 10
  • 20

2 Answers2

0

You want the innertext but your code states outertext - I think that's the cause for the duplication.

foreach($html->find('div') as $findDiv) {
  $wordCount = explode(' ', $findDiv->innertext);
  $wordCount = count($wordCount);
  if($wordCount > 300) {
    echo $findDiv->outertext . '<br />';
   }
}
David Müller
  • 5,291
  • 2
  • 29
  • 33
  • Hi David, thank you but I am afraid that I tried innertext, outertext and plaintext, and I have the same results in terms of duplication each time. – user1882752 Dec 10 '12 at 10:50
0

I am not sure why, but this has solved my problem.

I added the '1' parameter in the $html->find('div',1);

So the working code looks like:

$findDivs = $html->find('div',1);  //add a 1 to the divs. this works as the script now only loops once.

foreach($findDivs as $findDiv) {
  $wordCount = explode(' ', $findDiv->outertext);
  $wordCount = count($wordCount);
  if($wordCount <= 300) {
    $findDiv->outertext = '';
   }
   else {
     echo $findDiv->outertext . '<br />';
  }
}
user1882752
  • 576
  • 1
  • 10
  • 20