file_get_html & str_get_html with cURL are getting part of a page

Question

This is a really weird situation that I can't explain. I use simple HTML DOM and am trying to get the full code of this page:

http://ronilocks.com/

The thing is, I'm getting only part of what's actually on the page. For instance: look at the page source code and see all the script tags that are in the plugins folder. There are quite a few. When I check the same with the string I get back from simple HTML DOM none of them are there. Only wp-rocket.

(I used a clean file_get_html() and a file_get_contents() too and got the same result)

Any thoughts? Thanks!

Edit: Is it possible that wp-rocket (installed on the page being scrapped) knows that the page is being scrapped and shows something different?

There's not much to show :) Something like this is one option: dd(file_get_contents("http://ronilocks.com/")); (the dd is like var_dump in Laravel) — Avi, Mar 12 '18 at 11:56
In many cases, some contents are loaded by js because they are 'dynamic'. So these 'dynamic' contents may not be fixed in html codes. — Acepcs, Mar 12 '18 at 18:54
@Acepcs The content I don't see is not generated with JS, it exists in the source code — Avi, Mar 13 '18 at 08:12
@pguardiario using cURL with a Mozilla user-agent doesn't work either. — Avi, Mar 13 '18 at 08:13

score 0 · Answer 1 · answered Mar 13 '18 at 10:14

0

include 'simple_html_dom.php';
$html = file_get_html('http://ronilocks.com/');
echo count($html->find('a'));
// 425

I get 425. This looks right to me.

answered Mar 13 '18 at 10:14

pguardiario

53,827
19
119
159

Look for "script" or "link" instead of "a" and then loop through them and print them out. Now go to the actual site's source code and look for scripts with the word "plugins" in them. or look for the sentence "" in the response like it exists in the actual site. – Avi Mar 13 '18 at 12:27
Ok, it sounds like you have a separate question so you should edit or post a new question. Like "why can't simple html dom find this script/link?" – pguardiario Mar 13 '18 at 23:59
I edited the question a bit. I hope it's more clear now. Thanks – Avi Mar 14 '18 at 07:47

file_get_html & str_get_html with cURL are getting part of a page

1 Answers1