0

I have been stuck with this issue for quite a long time now. I have googled regarding this and also saw all the links associated with "chunked" in SO. So, finally decided to post this question. Let me brief the issue. I am having Java code which reads the response from a HTTPS using socket. The response is being received correctly and everything is going fine unless the transfer-encoding is "chunked". I am trying to read the chunked response from the socket as byte array and when I convert it to string the response is unreadable. I suspect that I am doing something wrong in processing the chunk data. Due to this issue I am also getting "Not in GZip format" exception when I try to decompress the response. The code which I am using to process chunks is

    int chunkLength;

    do {
        String lengthLine = inStream.readLine();
        if (lengthLine == null) {
            return false;
        }
        chunkLength = Integer.parseInt(lengthLine.trim(), 16);
        if (chunkLength > 0) {
            byte[] chunk = new byte[chunkLength];
            int bytesRead = inStream.read(chunk);
            if (bytesRead < chunkLength) {
                return false;
            }
            //Burn a CR/LF
            inStream.readLine();
        }//if chunkLength
    } while (chunkLength > 0) ;
    return true;

As I am new to asking question in SO I may be missing some(probably many) details which may be required for you for giving a solution. Please pardon me in such case and let me know if you need more details on this. Any help would be greatly appreciated. Cheers.

  • 1
    Why? Why not use `HttpURLConnection` that does it all for you? – user207421 May 05 '15 at 11:40
  • Thanks for your prompt reply. It is a constraint in the product that we should use sockets. Actually we are setting the proxy in browser to record the HTTP traffic and we read the recorded traffic using the socket. – Sathiya Narayanan May 05 '15 at 11:45
  • `HttpURLConnection` uses sockets. If somebody is setting a constraint that determines what classes you use, they shouldn't be. – user207421 May 05 '15 at 18:27

1 Answers1

0

There are three problems I see in this code:

  1. you are not taking into account that individual chunks might include extended information that you are not skipping. This is not common, but it is part of the spec so you should code for it. Otherwise, your Integer.parseInt() call will fail if you ever encounter it.

  2. you are not reading the entire chunk data. Since you are using inStreeam.read(), it has the potential to return fewer bytes than requested. Do not stop reading if that happens, it is normal behavior for a socket. You need to call read() in a loop until chunkLength number of bytes have been received in full. Only stop reading if a real error is reported.

  3. you are not reading the trailing HTTP headers that appear after the last chunk. Even if there are no headers, there is still a CRLF terminator to end the HTTP response.

Try something more like this:

try {
    String line;

    do {
        // read the chunk header
        line = inStream.readLine();
        if (line == null) {
            return false;
        }
        // ignore any extensions after the chunk size
        int idx = line.indexOf(';');
        if (idx != -1) {
            line = line.substring(0, idx);
        }
        // parse the chunk size
        int chunkLength = Integer.parseInt(line, 16);
        if (chunkLength < 0) {
            return false;
        }
        // has the last chunk been reached?
        if (chunkLength == 0) {
            break;
        }
        // read the chunk data
        byte[] chunk = new byte[chunkLength];
        int offset = 0;
        do {
            int bytesRead = inStream.read(chunk, offset, chunkLength-offset);
            if (bytesRead < 0) {
                return false;
            }
            offset += bytesRead;
        } while (offset < chunkLength);
        // burn a CRLF at the end of the chunk
        inStream.readLine();
        // now do something with the chunk...
    } while (true);

    // read trailing HTTP headers
    do {
        line = inStream.readLine();
        if (line == null) {
            return false;
        }
        // has the last header been read?
        if (line.isEmpty()) {
            break;
        }
        // process the line as needed...
    } while (true);

    // all done
    return true;
}
catch (Exception e) {
    return false;
}

With that said, keep in mind that chunking does not negate the fact that TCP/HTTP allows for streaming bytes. Each chunk is just a small piece of the larger data. So do not try to convert each individual chunk as-is to a String, or try to decompress it as a complete unit. You need to collect the chunks into a file/container of your choosing and then process the entire collected data as a whole once you have reached the end of the HTTP response. Unless you are pushing the chunks into a streaming processor, such as a GZip decompressor that supports push streaming. And if you do need to convert the collected data to a String, make sure you are using the charset that is specified in the HTTP response's Content-Type header (or an appropriate default if there is no charset specified) so the collected data gets decoded to Java's native UTF-16 string encoding correctly.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770