0

I use cURL / file_get_contents very often to get a page's source code. However, there is one website where this is not working for me.

Here is the code:

<?php 

$c = curl_init('https://plus.nl');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_POST, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

echo $html;


?>

In my browser, it just keeps loading. When I try any other website, it works instantly. What's up with this website that it does not work?

MantiNL
  • 69
  • 12

2 Answers2

3

EDIT: Having tried what you're doing, I can actually see the errors in the console. It's much simpler than x-frame-option security. The HTML refers to javascript and css at relative paths to the loaded HTML. In your case, the loaded HTML is coming from your website, not the original plus.nl - and hence all requests for css/javascript/images/etc - all result in 404 (not found).

Original answer (this is not applicable based on my further investigation): Most likely, the answer is with X-FRAME-OPTION header. The basic html is almost empty; everything else is loaded via javascript. Their X-FRAME-OPTION header only allows the assets to be loaded if the URL in the browser is https:/www.plus.nl/ - and in your case it's not, therefore none of the dynamic stuff can be loaded/executed.

Aleks G
  • 56,435
  • 29
  • 168
  • 265
  • Thanks for the reply. Is there any way around this? – MantiNL Jul 16 '18 at 15:54
  • You're completely misunderstanding `X-Frame-Options`. He isn't using frames. – SLaks Jul 16 '18 at 15:55
  • @MantiNL No, there is not. This is exactly what these headers are there for - it's a security measure. – Aleks G Jul 16 '18 at 15:55
  • @SLaks I do understand x-frame-option, although in this case, upon further investigation, I can see that the problem is something completely different. I updated my answer accordingly. – Aleks G Jul 16 '18 at 16:02
  • Reply to your edit: Wouldn't it be possible to change these relative paths to their correct locations? – MantiNL Jul 18 '18 at 14:33
0

I tried file_get_contents and it works on the site. However, it's not very usable since the site is detecting the lack of javascript. Setting the useragent with curl didn't do the trick as well.

I'm just getting the message

We werken momenteel aan de website. De huidige pagina werkt nog niet optimaal op mobiel.

which translates to:

We're currently working on the website. The current page isn't optimally working for mobile devices.

So maybe your IP just got banned by them.

maio290
  • 6,440
  • 1
  • 21
  • 38