8

How is it possible that get_headers() could possibly return a different result than getting them by CURL? Here is my code:

header("Content-type: text/plain");
$url = 'http://www.foxbusiness.com/index.html';

echo "get_headers() headers:\n\n";
$headers = get_headers($url);
print_r($headers);

echo "\n\nCURL headers\n\n";
$curl = curl_init();
curl_setopt_array( $curl, array(
    CURLOPT_HEADER => true,
    CURLOPT_NOBODY => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_URL => $url ) );
$headers = explode( "\n", curl_exec( $curl ) );
curl_close( $curl );
print_r($headers);

This is the result:

get_headers() headers:

Array
(
    [0] => HTTP/1.0 403 Forbidden
    [1] => Server: AkamaiGHost
    [2] => Mime-Version: 1.0
    [3] => Content-Type: text/html
    [4] => Content-Length: 283
    [5] => Expires: Fri, 31 Aug 2012 07:29:14 GMT
    [6] => Date: Fri, 31 Aug 2012 07:29:14 GMT
    [7] => Connection: close
)


CURL headers

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=64
    [5] => Date: Fri, 31 Aug 2012 07:29:14 GMT
    [6] => Connection: keep-alive
    [7] => 
    [8] => 
)
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Mike
  • 23,542
  • 14
  • 76
  • 87
  • @PhpMyCoder I included the code above. I'm not sure how to tell what headers it sends. – Mike Aug 31 '12 at 07:40

2 Answers2

7

get_headers will do a GET request by default while you configured cURL to do a HEAD request. Start by making the request identical to what cURL sends by putting a different HTTP stream context using HEAD for the request method.

Also, the server seems to expect a User Agent, so make sure you either provide user_agent in php.ini or add it to the stream context.

The following should work:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD',
            'user_agent' => "PHP"
        )
    )
);

See http://codepad.viper-7.com/cOO9XS

Note that stream_context_set_default modifies the global default Stream Context, so any calls to other methods using this stream wrapper will now do HEAD requests once you called the above. Unlike for example, file_get_contents, get_headers does not allow supplying a custom stream context via arguments to the function. In other words, make sure you change the method back to GET after you got the headers.

Gordon
  • 312,688
  • 75
  • 539
  • 559
4

Add a different User-Agent header before get_headers:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD',
            'header' => "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1\r\n"
        )
    )
);

And, might as well specify HEAD since you only want headers. With this change you get the right headers.

OUTPUT

get_headers() headers:

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=76
    [5] => Date: Fri, 31 Aug 2012 07:53:24 GMT
    [6] => Connection: close
)


CURL headers

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Server: Apache
    [2] => X-FoxNews-EdgeTTL: 2m
    [3] => Content-Type: text/html;charset=UTF-8
    [4] => Cache-Control: max-age=76
    [5] => Date: Fri, 31 Aug 2012 07:53:24 GMT
    [6] => Connection: keep-alive
    [7] => 
    [8] => 
)
sberry
  • 128,281
  • 18
  • 138
  • 165