0

This is a really weird situation that I can't explain. I use simple HTML DOM and am trying to get the full code of this page:

http://ronilocks.com/

The thing is, I'm getting only part of what's actually on the page. For instance: look at the page source code and see all the script tags that are in the plugins folder. There are quite a few. When I check the same with the string I get back from simple HTML DOM none of them are there. Only wp-rocket.

(I used a clean file_get_html() and a file_get_contents() too and got the same result)

Any thoughts? Thanks!

Edit: Is it possible that wp-rocket (installed on the page being scrapped) knows that the page is being scrapped and shows something different?

Avi
  • 728
  • 3
  • 10
  • 20
  • Please show what code produces this. – revo Mar 12 '18 at 11:40
  • There's not much to show :) Something like this is one option: dd(file_get_contents("http://ronilocks.com/")); (the dd is like var_dump in Laravel) – Avi Mar 12 '18 at 11:56
  • In many cases, some contents are loaded by js because they are 'dynamic'. So these 'dynamic' contents may not be fixed in html codes. – Acepcs Mar 12 '18 at 18:54
  • Try using curl and set the user-agent header to mozilla. – pguardiario Mar 12 '18 at 23:51
  • @Acepcs The content I don't see is not generated with JS, it exists in the source code – Avi Mar 13 '18 at 08:12
  • @pguardiario using cURL with a Mozilla user-agent doesn't work either. – Avi Mar 13 '18 at 08:13

1 Answers1

0
include 'simple_html_dom.php';
$html = file_get_html('http://ronilocks.com/');
echo count($html->find('a'));
// 425

I get 425. This looks right to me.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • Look for "script" or "link" instead of "a" and then loop through them and print them out. Now go to the actual site's source code and look for scripts with the word "plugins" in them. or look for the sentence "" in the response like it exists in the actual site. – Avi Mar 13 '18 at 12:27
  • Ok, it sounds like you have a separate question so you should edit or post a new question. Like "why can't simple html dom find this script/link?" – pguardiario Mar 13 '18 at 23:59
  • I edited the question a bit. I hope it's more clear now. Thanks – Avi Mar 14 '18 at 07:47