1

If I try to curl this css file like this https://web.archive.org/web/20170322073013cs_/http://afterschoolprograms.com/sites/all/themes/afterschoolprograms_dev/style.css?n , it return 403 Forbidden error. If I try to open it in browser it return the same error

If I open this page https://web.archive.org/web/20170322073013/http://afterschoolprograms.com/, this css page works fine and return 302 that lead to working css file.

Then If open the same css file again with browser, it works fine.

How can I always curl this css file and return the 302 redirection that lead to a working css file?

Here is the php code I m currently using and that always return 403 Forbidden error:

$url = "https://web.archive.org/web/20170322073013cs_/http://afterschoolprograms.com/sites/all/themes/afterschoolprograms_dev/style.css?n";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout in seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow 301 redirection       
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'waybackmachinedownloader');
$html = curl_exec($ch);
curl_close($ch);
mohamed
  • 173
  • 2
  • 2
  • 14
  • Probably something simple like a referrer check ... But can you explain why you would need to do this via cURL in the first place? If you needed an “old” site’s CSS, then I would assume you would download it once, and then place it on your own server or sth. like that. But if you request it over and over again (probably on every load of your own site?), then I would understand why they do not want this and take measures to prevent it - they don’t want to be abused as a CDN. – CBroe Jun 26 '17 at 14:48
  • I lost this website , and want to scrape it from archive.org. how can I do a clean curl request without getting this 403 forbidden error? Can you give me the php code that I need to use? – mohamed Jun 26 '17 at 14:50

1 Answers1

0

As suggested by Cbroe, adding the referer header works :

<?php

$url = 'https://web.archive.org/web/20170322073013cs_/http://afterschoolprograms.com/sites/all/themes/afterschoolprograms_dev/style.css?n';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, 'https://web.archive.org/web/20170322073013/http://afterschoolprograms.com/');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);

var_dump($html);

?>
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159