0

I am having a need of getting http response code for page urls from sitemap.xml file. When i get the response code by my cron process it returns 403 (Known as access forbidden : though i can access the passed url from browser).

But if I run the same code from my localhost, it returns the correct http response code (i.e 200).

Why is the difference in returning different http response code from local host and from server ?? How to resolve the problem ?

The code for extracting the http response code is as below.

function check_response_code() {
    $pageurl='http://www.certona.com/online-merchandising/';
    $trimurl = '';
    $start = '';
    $end = '';
    $total = '';

    $start = microtime(true);
    $response_code = '';
    if (!stristr($pageurl, "http://"))
    {
        if (!stristr($pageurl, "https://"))
        {
            $trimurl = "http://" . $pageurl;
        } else
        {
            $trimurl = $pageurl;
        }
    } else
    {
        $trimurl = $pageurl;
    }
    $curl = curl_init();
    //don't fetch the actual page, you only want headers

    curl_setopt($curl, CURLOPT_URL, $trimurl);
    curl_setopt($curl, CURLOPT_NOBODY, true);
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FILETIME, true);

    $result = curl_exec($curl);

    $timestamp = curl_getinfo($curl, CURLINFO_FILETIME);
    $response_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);
    $mime_type = curl_getinfo($curl, CURLINFO_CONTENT_TYPE);
    $end = microtime(true);
    $total = round($end - $start, 5);

    if ($timestamp != -1)
    { //otherwise unknown
        $arr=array(date("Y-m-d H:i:s", $timestamp), $response_code, $total, $mime_type); //etc
    } else
    {
        $arr=array("", $response_code, $total, $mime_type);
    }
    echo "<pre>";
    print_r($arr);
    echo "</pre>";
}

Thank you..

nir
  • 83
  • 1
  • 5
  • 11

3 Answers3

0

There can by many reasons for this...

Is it you own server? => http://codewithdesign.com/2011/05/26/curl-403-error-returning/

Maybe set CURLOPT_USERAGENT to "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0"

Or read this curl gives 403 error?

Community
  • 1
  • 1
PiTheNumber
  • 22,828
  • 17
  • 107
  • 180
  • CURLOPT_USERAGENT is some what helpful when i keep sleep(10) between requests.. but if i don't use sleep(10) , i get the 403 response code after some time . – nir Apr 25 '12 at 10:54
0

Your localhost runs curl through your computer. It's like your browser opened the site with your ip address and stuff.

The server does it in another way.

I recall once I solved a smiliar problem by removing the trailing / in the url.

Try running code as

$pageurl = rtrim('http://www.certona.com/online-merchandising/', '/)';

But basicly I don't think your allowed to fetch a directory's data from another site.
Shouldn't the url be ending on .xml to get the sitemap?

$pageurl = 'http://www.certona.com/sitemap.xml';
Robin Castlin
  • 10,956
  • 1
  • 28
  • 44
  • Hi .. sitemap.xml contains the link urls of a site. Here the page URL is one of the link urls from a sitemap.xml. Thank u for the reply. – nir Apr 25 '12 at 07:00
  • hi..i have tried removing '/' and keeping '/'. but there is no difference in curl response on local. By both the way i get response code 200 in local and 403 on server !! . – nir Apr 25 '12 at 09:57
0

Am not sure but your code seems to work fine

Try

check_response_code();

function check_response_code() {
    $pageurl='http://www.certona.com/online-merchandising/';
    $curl = curl_init($pageurl);
    curl_setopt($curl, CURLOPT_NOBODY, true);
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FILETIME, true);

    $result = curl_exec($curl);
    $info = curl_getinfo($curl);
    $info['filetime'] = date("Y-m-d H:i:s", $info['filetime']);
    echo "<pre>";
    print_r($info);
    echo "</pre>";
}

Output

Array
(
    [url] => http://www.certona.com/online-merchandising/
    [content_type] => text/html; charset=utf-8
    [http_code] => 200
    [header_size] => 488
    [request_size] => 76
    [filetime] => 2012-04-24 15:11:28
    [ssl_verify_result] => 0
    [redirect_count] => 0
    [total_time] => 1.342
    [namelookup_time] => 0
    [connect_time] => 0.25
    [pretransfer_time] => 0.25
    [size_upload] => 0
    [size_download] => 0
    [speed_download] => 0
    [speed_upload] => 0
    [download_content_length] => 0
    [upload_content_length] => 0
    [starttransfer_time] => 1.342
    [redirect_time] => 0
    [certinfo] => Array
        (
        )

    [redirect_url] => 
)
Baba
  • 94,024
  • 28
  • 166
  • 217
  • hello ... this is the output from the localhost . but when i try the same code from the cron process on server, it returns different http response code..(i.e 403 instead of 200) ! – nir Apr 25 '12 at 07:03