1

I need to get the contents of a page, which always sends a Content-Length: 0 header, however the page is never empty.

The file_get_contents(url) just returns an empty string.

The whole header returned by the page is:

HTTP/1.1 200 OK
X-Powered-By: PHP/5.3.10
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Sat, 18 Feb 2012 18:14:59 GMT
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Sat, 18 Feb 2012 18:14:59 GMT
Server: lighttpd

Would it be possible to use file_get_contents and ignore the header or do I need to use curl?

Edit

get_headers(url) output (using print_r):

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => X-Powered-By: PHP/5.3.10
    [2] => Content-type: text/html
    [3] => Content-Length: 0
    [4] => Connection: close
    [5] => Date: Sat, 18 Feb 2012 22:39:52 GMT
    [6] => Server: lighttpd
)
Tyilo
  • 28,998
  • 40
  • 113
  • 198
  • Are you sure there's no redirect or something like that going on? Is informing the site's sysadmin about the error out of the question? Have you tried `file_get_contents()` - does it really listen to the `content-length` header? – Pekka Feb 18 '12 at 18:12
  • Weird. I guess you could try `fread()` because you can specify a length there. No idea whether it'll work though - I would talk to the webmaster, if the response contains data, this is clearly an error – Pekka Feb 18 '12 at 18:16
  • All PHP functions that retrieve body content of URL's ignore the headers and bring the content. Check allow_url_fopen directive, error_reporting and if the target can be blocking you somehow. Can you paste get_headers() on your question? – Guilherme Viebig Feb 18 '12 at 22:36
  • @GuilhermeViebig `allow_url_fopen` = `1`, `error_reporting` = `6135`. Added `get_headers`. – Tyilo Feb 18 '12 at 22:43
  • You could try using cURL – Kavi Siegel Feb 18 '12 at 22:44
  • 1
    Are you sure the request should return a text? I'm asking this because there are some web sites there that return a different result depending on what type of headers you add in the reqest ("Accept", "User-Agent" etc.). These headers are added by your browser, but must be added manually by you when you use PHP – Optimist Feb 18 '12 at 22:52
  • 1
    @Optimist It was the missing User-Agent header that made the server return an empty string. Strangely the `Content-Length: 0` response header is always there no matter what. If you add your comment as an answer, I will accept it as the correct answer. – Tyilo Feb 18 '12 at 23:16

2 Answers2

0

As noted by Optimist the problem had nothing to do with the headers, but rather that I didn't send any User-Agent header to the server.

file_get_contents worked perfectly after sending User-Agent headers, even though the server always returns Content-Length: 0.

Weird.

Tyilo
  • 28,998
  • 40
  • 113
  • 198
0

I believe, that none of HTTP-level functions can not read such an answer. Because it is incorrect HTTP answer, it says "my body is empty, dont read it"

You definitely need your own function based on fread, which will phisically read the socket. Something like this:

$aURL    = parse_url($sURL);

if ($iHandle = fsockopen($aURL["host"], 80, $iError, $sError))
{
    $sQuery = substr($sURL, strpos($sURL, $aURL["host"]) + strlen($aURL["host"]));

    $sOut   = "GET " . (($sQuery != "") ? $sQuery : "/") . " HTTP/1.1\r\n";
    $sOut  .= "Host: " . $aURL["host"] . "\r\n";
    $sOut  .= "Connection: Close\r\n\r\n";

    fputs($iHandle, $sOut);

    while (!feof($iHandle))
    {
        $sResult .= fread($iHandle, 1024);
    }
}

Then just cut the headers.

Stepan Stepanov
  • 231
  • 2
  • 4
  • Please look at the comments, I have already resolved the issue. – Tyilo Feb 20 '12 at 16:46
  • You´re wrong. I posted in comments: All PHP functions that retrieve body content of URL's ignore the headers and bring the content. So Content-lenght: 0 would not affect anything. – Guilherme Viebig Feb 23 '12 at 13:42