2

I installed Apache httpcomponents-client-5.0.x and while reviewing the headers of the http response, I was shocked it doesn't show the Content-Length and Content-Encoding headers, this is the code I used for testing

import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import com.sun.net.httpserver.Headers;

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet request = new HttpGet(new URI("https://www.example.com"));
CloseableHttpResponse response = httpclient.execute(request);
Header[] responseHeaders = response.getHeaders();
for(Header header: responseHeaders) {               
    System.out.println(header.getName());
}
// this prints all the headers except 
// status code header
// Content-Length
// Content-Encoding

No matter what I try I get the same result, like this

Iterator<Header> headersItr = response.headerIterator();
while(headersItr.hasNext()) {
    Header header = headersItr.next();
    System.out.println(header.getName());
}

Or this

HttpEntity entity = response.getEntity();
System.out.println(entity.getContentEncoding()); // NULL
System.out.println(entity.getContentLength());   // -1

According to this question that has been asked 6 years ago, it seems like an old issue even with older versions of Apache HttpClient.

Of-course the server is actually returning those headers as confirmed by Wireshark, and Apache HttpClient logs itself

2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << HTTP/1.1 200 OK
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Encoding: gzip
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Accept-Ranges: bytes
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Age: 451956
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Cache-Control: max-age=604800
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Type: text/html; charset=UTF-8
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Date: Fri, 03 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Etag: "3147526947+gzip"
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Expires: Fri, 10 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Server: ECS (dcb/7EEB)
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Vary: Accept-Encoding
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << X-Cache: HIT
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Length: 648

BTW, java.net.http library known as JDK HttpClient works great and show all the headers.

Is there something wrong I did, or should I report a bug that been there for years ?

Accountant م
  • 6,975
  • 3
  • 41
  • 61
  • Check if you have the same issue on version 4.x – Antoniossss Apr 03 '20 at 07:15
  • 1
    Behavior of HttpClient 4.x is exactly the same by design. If someone does not want transparent content compression one can easily disable it when building HttpClient – ok2c Apr 03 '20 at 16:35
  • @ok2c Thanks I have read your answer [here](https://stackoverflow.com/a/30218535) , yes this solution will prevent HttpClient from sending the `Accept-Encoding` header automatically, and If I set this header manually, HttpClient will not decompress the response content, is there any way to get the response decompressed and the response headers also ? should I ask another question for that ? – Accountant م Apr 03 '20 at 20:29
  • @Accountantم Those headers are removed for a good reason. But if you are absolutely sure you can replace the standard `ContentCompressionExec` with a custom exec interceptor. – ok2c Apr 04 '20 at 07:57
  • @ok2c that seems like a painful job, I will check it, but if it's really hard to do, I have no way but sacrificing those headers :( – Accountant م Apr 04 '20 at 08:03

2 Answers2

6

HttpComponents committer here...

You did not closely pay attention what Dave G said. By default, HttpClientBuilder will enable transparent decompression and the reason why you don't see some headers anymore is here:

if (decoderFactory != null) {
  response.setEntity(new DecompressingEntity(response.getEntity(), decoderFactory));
  response.removeHeaders(HttpHeaders.CONTENT_LENGTH);
  response.removeHeaders(HttpHeaders.CONTENT_ENCODING);
  response.removeHeaders(HttpHeaders.CONTENT_MD5);
} ...

Regarding the JDK HttpClient, it will not perform any transparent decompression, therefore you see the length of the compressed stream. You have to decompress on your own.

curl committer here...

I have raised an issue too.

Update: 03 Feb. '23 The secret codez to disable automatic decompression are:

CloseableHttpClient httpclient = HttpClients.createSimple();
// OR
CloseableHttpClient httpclient = HttpClients.custom().disableContentCompression().build();
SiKing
  • 10,003
  • 10
  • 39
  • 90
Michael-O
  • 18,123
  • 6
  • 55
  • 121
  • Thank you very much Michael-O for your time in HttpComponents and for having an active account here on SO. But Michael I'm greed and need both features (the decompressed content + all response headers) for example like curl. I don't have to sacrifice one of them, why are you removing these headers ? they are actual headers returned from the server. **Is there any way to get the decompressed content and the headers?** – Accountant م Apr 03 '20 at 20:19
  • 1
    @Accountantم As ok2c pointed out you cannot have both, unless you write custom code. You should perform the decompression manually. This will retain all headers. Regarding curl, I consider this to be a bug. I have raised an issue. – Michael-O Apr 04 '20 at 12:26
  • I have commented on the curl issue on GitHub, please Michael reconsider allowing your consumers to get the response headers as sent from the server, even if as raw string. – Accountant م Apr 05 '20 at 00:54
  • 1
    @Accountantم you can have access. Disable automatic decompression. – Michael-O Apr 05 '20 at 08:37
  • Michael :) you know I mean while having the response automatically decompressed. Anyway in the mean time I have nothing but sacrificing these headers, we have to finish the project in the planned time and we have no time for implementing our own decompression. Thank you for your time and effort in HttpComponents and **I hope you change your mind one day and think of it from another perspective as the user of the library not the maker of the library.** – Accountant م Apr 06 '20 at 05:00
2

The content-length may be potentially ignored in this case.

HttpGet request = new HttpGet(new URI("https://www.example.com"));
request.setHeader("Accept-Encoding", "identity");
CloseableHttpResponse response = httpclient.execute(request);

I can see the following

HttpEntity entity = response.getEntity();
System.out.println(entity.getContentLength());
System.out.println(entity.getContentEncoding());

Output

...
2020-04-03 03:04:17.760 DEBUG 34196 --- [           main] org.apache.hc.client5.http.headers       : http-outgoing-0 << Content-Length: 1256
...
1256
null

I'd like to direct your attention to this header being sent:

http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate

That tells the server that this client can accept gzip, x-gzip, and deflate content in response. The response is stating it is 'gzip' encoded.

http-outgoing-0 << Content-Encoding: gzip

I believe that HttpClient is transparently handling this internally and making the content available.

As stated in the other article you referenced, one of the answers indicated that the method EntityUtils.toByteArray(httpResponse.getEntity()).length could be applied to get the content length.

Dave G
  • 9,639
  • 36
  • 41
  • OK, yes it works only if I sent the `Accept-Encoding: identity` header, otherwise `HttpClient` will not show the `Content-Length` header :( .Regarding counting the bytes workaround, I don't want to do that as I will use the library intensively, and counting the bytes for every http request I do could produce a performance issue that can be avoided only if HttpClient tells me all the headers. Thank you very much for your help – Accountant م Apr 03 '20 at 08:04
  • You're welcome - I wish I had a better solution for what you were after. Please mark this answer as accepted if you are satisfied with it. – Dave G Apr 03 '20 at 12:02
  • Thanks Dave G, yes your answer helped me lot, and I upvoted it yesterday, however seeing the actual lines of code that causes this behavior in the accepted answer may help the later reader much, I wish I can accept both answers. – Accountant م Apr 03 '20 at 20:20