3

Am testing with a client who send me a HTTP request with no content length header but has a content.

How do I extract this content without the help of contentlength header?

Rod
  • 52,748
  • 3
  • 38
  • 55
Kozlov
  • 564
  • 1
  • 7
  • 21

3 Answers3

5

I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:

The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.

So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)

Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).

If you could give us more information about your context, that would help.


Original answer

If there's no content length, that means the content continues until the end of the data (when the socket closes).

Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:

byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
    output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"

EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.

The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.

If this answer turns out to be "definitely useless" I'll delete it...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • And the read should start after the end of the headers, right? – Geo Mar 03 '11 at 07:59
  • @Geo: Yes. Typically HTTP APIs parse the headers for you, and give you an input stream for the content. – Jon Skeet Mar 03 '11 at 08:01
  • Ah, ok. I was assuming that since he needs to do it this way, he can't rely on any existing classes. – Geo Mar 03 '11 at 08:04
  • This is certainly true for a response, but the OP said this was a **request**; how's the client supposed to read your response if it closed the connection? – SimonJ Mar 03 '11 at 08:17
  • If there's no content length, it can also mean that chunked encoding is in place. (I guess the proper answer depends on what layer is being used; it it's for instance a ServletInputStream then the servlet container will haven take care of that, and chunked encoding should already be dealt with) – Julian Reschke Mar 03 '11 at 08:23
  • @SimonJ If client needs response, why should the client close the connection? When a chunked stream has been fully sent, you only need to close the OutputStream on client side and keep the InputStream opened for server response. – gigadot Mar 03 '11 at 08:38
  • @Julian: Yes, it could be chunked encoding, and as you say the servlet container would handle that. – Jon Skeet Mar 03 '11 at 08:40
  • @gigadot: IIRC closing a SocketOutputStream closes the underlying socket too. I suppose you could call Socket.shutdownOutput(), but I expect most webservers will just assume the client went away. – SimonJ Mar 03 '11 at 08:40
  • @SimonJ the question is about how to read the content sent from client. i don't think we should worry whether client closes the connection after he sends the content or not. – gigadot Mar 03 '11 at 08:50
  • @gigadot: No, it's an entirely reasonable concern - because for a request, it doesn't make *sense* to close the whole connection before the server can reasonably respond. See my answer's edit. – Jon Skeet Mar 03 '11 at 08:52
  • @Jon: I suspect the client *could* just shut down the sending part, but I'm not sure many servers would bother responding in that case (distinguishing a half-close from a full close is trickier), hence a "real" client is unlikely to do it. The RFC discourages this approach for delimiting request bodies, too (section 4.4, item 5). – SimonJ Mar 03 '11 at 09:03
  • @SimonJ: If the client hasn't sent Content-Length or Transfer-Encoding, the server should probably respond with a 400 anyway :) – Jon Skeet Mar 03 '11 at 09:04
  • True - or even "411 Length Required". It's as if someone anticipated this scenario ;) – SimonJ Mar 03 '11 at 09:59
  • @SimonJ: Ooh, thanks - I hadn't spotted 411. Edited. There should be a 911 response code for "fire engine required". – Jon Skeet Mar 03 '11 at 10:01
  • Thank you for all your help. I guess 400 status or 411 as mentioned above is the correct behaviour . My server recevies a POST HTTP request without a content lenght header and with a content body .Closing the connection wont help since I ve to give back a response .Also in HTTP rfc i saw content-lenght header is manadatory for packets with content body .So I think client needs to change its request :) – Kozlov Mar 09 '11 at 11:45
  • @Kozlov: Well, *either* Content-Length *or* Transfer-Encoding. Has the request got *neither*? If so, reject it. – Jon Skeet Mar 09 '11 at 11:47
3

If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.

Apart from Content-Length:, the other methods of determining content length are:

  • Transfer-Encoding: chunked
  • guesswork

Hopefully it's the former, in which case the request should look something like this:

POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

3
con
8
sequence
0

(shamelessly stolen from the Wikipedia article and modified for a request)

  • each chunk is of the form: hex-encoded length, CRLF, data, CRLF
  • after the final data-carrying chunk comes a zero-length chunk with no data
  • after the zero-length chunk comes optional extra HTTP headers
  • after the optional HTTP headers comes another CRLF
SimonJ
  • 21,076
  • 1
  • 35
  • 50
0

See HTTPbis Part1, Section 3.3.

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98