0

I am using cURL with PHP to get content type and content encoding. I am successfully getting content type but content encoding value is empty.

function get_content_type_curl($url_content_type) {
    
    $agent_content_type = $_SERVER['HTTP_USER_AGENT'];
    $ch_content_type = curl_init();

    curl_setopt($ch_content_type, CURLOPT_URL, $url_content_type);
    curl_setopt($ch_content_type, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch_content_type, CURLOPT_HEADER, 0);
    curl_setopt($ch_content_type, CURLOPT_NOBODY, 1);
    curl_setopt($ch_content_type, CURLOPT_USERAGENT, $agent_content_type);
    curl_setopt($ch_content_type, CURLOPT_FOLLOWLOCATION, 1);

    curl_exec($ch_content_type);
    $content_type = curl_getinfo($ch_content_type, CURLINFO_CONTENT_TYPE);
    $content_encoding = defined('CURLINFO_CONTENT_ENCODING') ? curl_getinfo($ch_content_type, CURLINFO_CONTENT_ENCODING) : '';
    //$content_encoding = curl_getinfo($ch_content_type, CURLINFO_CONTENT_ENCODING);

    curl_close($ch_content_type);

    return array("content_type" => $content_type, "content_encoding" => $content_encoding);
}

$result = get_content_type_curl("https://affiliatefix.com/sitemap-1.xml");

echo $result["content_type"] . "\n";
if (!empty($result["content_encoding"])) {
    echo $result["content_encoding"] . "\n";
}

/**if (strpos($result["content_encoding"], "gzip") !== false) {
    echo $result["content_encoding"] . "\n";
} else {
    echo "No encoding".$result["content_encoding"] . "\n";
}**/

Output for https://affiliatefix.com/sitemap-1.xml :

Content Type : application/xml; charset=utf-8 //successfully getting

Content encoding : gzip //I'm getting empty.

enter image description here

Mehul Kumar
  • 461
  • 8
  • 1
    _"I'm getting empty."_ - probably because the server did _not_ use any, when responding to _this_ request. You did not send any `Accept-Encoding` header with your request, so the server does not know that the client (your script making the cURL request) is capable of actually _supporting_ any. – CBroe Feb 17 '23 at 07:52
  • You use `defined('CURLINFO_CONTENT_ENCODING')` in the line where you get the encoding. Strange, way of doing things. Note that you don't actually define a value for that constant. – KIKO Software Feb 17 '23 at 07:54
  • @CBroe in browser network tab it showing `content encoding: gzip`. I have updated screenshot in question – Mehul Kumar Feb 17 '23 at 07:55
  • @KIKOSoftware i also tried `$content_encoding = curl_getinfo($ch_content_type, CURLINFO_CONTENT_ENCODING);` – Mehul Kumar Feb 17 '23 at 07:57
  • @MehulKumar yes, because the browser has sent in the header something like that : accept-encoding: gzip, deflate, br – svgta Feb 17 '23 at 07:57
  • The server will (should) only gzip-encode the response **if the client explicitly declared in its request that it supports gzip!** It's conditional. Have you tried actually outputting the result you got back, including the headers, to see what you're getting back and whether it appears to be gzipped…!? – deceze Feb 17 '23 at 07:58
  • @svgta so, final `content-encoding` is automatically defined by `browser-client-side` and it not a `server-side` value? – Mehul Kumar Feb 17 '23 at 08:00
  • @deceze Content-encoding always returned empty! – Mehul Kumar Feb 17 '23 at 08:01
  • yep, that's the reason. You can try to add the header "accept-encoding" in your curl to see what appends – svgta Feb 17 '23 at 08:01
  • Do you understand what `Content-Encoding: gzip` does? It's a way to compress the response, it's a way for the server to return a smaller response and save bandwidth. But for that to be possible, the server needs to be sure that the client can actually understand and uncompress gzipped responses. So the server will only gzip a response if and when the client has voluntarily indicated in its request that it expects and supports it. – deceze Feb 17 '23 at 08:03
  • _"in browser network tab it showing"_ - not really relevant, because it is not your browser making the request here. But if you had looked at the _request_ in that situation, you should have seen an `Accept-Encoding` header that your browser sent. – CBroe Feb 17 '23 at 08:05
  • @svgta in default it showing that value without defining anything. screenshot : https://prnt.sc/W99WVuMhkTU8 – Mehul Kumar Feb 17 '23 at 08:05
  • _"it showing that value without defining anything"_ - what are you talking about now? Yes, there _is_ an `Accept-Encoding` request header there - because your browser knows which encodings it supports, and _let's the server know_. And you need to do the same thing, with your cURL request. – CBroe Feb 17 '23 at 08:06
  • @CBroe so, we can say browser that we accept this all encoding and rest the `content encoding` is defined by **browser at client side** - Right? – Mehul Kumar Feb 17 '23 at 08:16
  • 1
    Bit of a weird way of phrasing it. What's wrong with leaving it at, "the browser (or more general, client) tells the server, which encodings it can understand"? – CBroe Feb 17 '23 at 08:19

1 Answers1

1

Not sure how you found this constant CURLINFO_CONTENT_ENCODING. It doesn't appear in php documents or cURL documents. To get the response header, you need to register a callback function like this:

curl_setopt($ch_content_type, CURLOPT_HEADERFUNCTION, function($ch, $header){
    if(stripos($header, 'content-encoding') === 0){
        #parse content_encoding here.
    }
    return strlen($header);
});

Another way is set CURLOPT_HEADER then truncate the header manually. of couse since you doesn't need the body, the returned string is the whole header:

curl_setopt($ch_content_type, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch_content_type, CURLOPT_HEADER, 1);
curl_setopt($ch_content_type, CURLOPT_NOBODY, 1);
$header_and_body = curl_exec($ch_content_type);

$header_size = curl_getinfo($ch_content_type, CURLINFO_HEADER_SIZE);
$header = substr($header_and_body, 0, $header_size);
shingo
  • 18,436
  • 5
  • 23
  • 42
  • How to get only `Content-type` and `Content-encoding` from whole header? – Mehul Kumar Feb 17 '23 at 08:18
  • The header is just a string splitted by LF. Usually the callback function accepts the whole header line by line, search each line with what you need. – shingo Feb 17 '23 at 08:22
  • Anybody wondering where "CURLINFO_CONTENT_ENCODING" is coming from? It's ChatGPT... Recommended me the same thing just now. :) – Greg Jul 14 '23 at 17:36