0

I cant acces this site with any of Bellow methods, the $url adress Works in all my browsers but, i just cant fetch data from that site.... how is that possible ? Not even the robots.txt ('https://www.natterer-modellbau.de/robots.txt) cant be fetched, other than in a browser....

I see results on Google from that website, how can Google acces the site when I cant ?

The page has rejected my crawler from first try, så they cant have bloked my servers IP allready ? and my script can acces all other URL`s -- im frustrated :) please help...

$url = 'https://www.natterer-modellbau.de/Flugzeuge';

$pageHeaders = get_headers($url,1); // DOES NOT WORK - TIMES OUT
file_get_contents($url); // DOES NOT WORK -Times OUT 

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$res = curl_exec($ch);
$rescode = curl_getinfo($ch, CURLINFO_HTTP_CODE); 
curl_close($ch) ;
echo $res; // DOES NOT WORK - TIMES OUT

1 Answers1

0

Works for me. You may be behind a proxy.

php > $url = 'https://www.natterer-modellbau.de/Flugzeuge';
php >
php > $pageHeaders = get_headers($url,1); // DOES NOT WORK - TIMES OUT
php > file_get_contents($url); // DOES NOT WORK -Times OUT
php >
php > $ch = curl_init();
php > curl_setopt($ch, CURLOPT_URL, $url);
php > curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
php > curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
php > curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
php > $res = curl_exec($ch);
php > $rescode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
php > curl_close($ch) ;
php > echo $res;
<!DOCTYPE html>
<html lang="de">
<head>

        <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
        <meta name="description" content="Elektro, Segler/E-Segler, Verbrenner">
        <meta name="keywords" content="Elektro, Segler/E-Segler, Verbrenner">

.... Many lines omitted

finally end of html

        </script>

    <script>
        jtl.load(["asset/plugin_js_head?v=4.05","asset/jtl3.js?v=4.05","asset/plugin_js_body?v=4.05",]);
            </script>

</body>
</html>
php >
Ari Singh
  • 1,228
  • 7
  • 12
  • I tryed my script on two other webservers, same problem. Timesout. – Morten Pedersen Mar 18 '18 at 19:15
  • I have allso tryed to fetch https://www.natterer-modellbau.de/robots.txt. Same problem. Theese website cant fetch it either.... http://tools.seobook.com/robots-txt/analyzer/ http://tools.seochat.com/tools/robots-txt-validator/#sthash.qWdMtfwB.dpbs https://seositecheckup.com/tools/robotstxt-test – Morten Pedersen Mar 18 '18 at 19:21
  • Please run "curl -v https://www.natterer-modellbau.de/Flugzeuge" from the command line (install curl if you do not have it installed already) and post the output from this curl command. It has lot of diagnostics information about the connection and the download. Also is the problem only for "https" or for "http" also ? – Ari Singh Mar 18 '18 at 22:43
  • I do not use curl on my own computer, my webpages is on a webhotel. The website refuses connection without https (secure). my curl script can acces meny other both http and https sites.. – Morten Pedersen Mar 22 '18 at 16:15