1

I'm crawling website, which text contains unencoded < or > sign. This breaks it's content, which then appears empty.

Example

$html = '<div id="test-div">< 50%</div>';
$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
echo $crawler->filter('#test-div')->first()->text(); // Empty string

Is there a way I can still get content of #test-div (which I expect to be < 50%)?

Skysplit
  • 1,875
  • 12
  • 16
  • 2
    Then that's one of the rare cases where a DOM wrapper is perhaps less suited to HTML scraping. You might want to investigate HTMLPurify to salvage it / still keep your code simpler with the DomCrawler method. – mario Feb 15 '17 at 13:45
  • @mario thats really good idea. HTMLPurify does a little bit too much than I want, but PHP Tidy is just fine. I guess thats the solution for my problem, so please, could you post your comment as an answer? Thanks! – Skysplit Feb 16 '17 at 07:38

0 Answers0