1

I'm writing a file transfer protocol server in Java that is utilizing the HTTP/1.1 standard outlined in RFC2616.

After the server accepts a connection, I'm trying to extract the HTTP request message. I want to do it in such a way that I'm not assuming the entire message will be sent through a single send operation. I feel like the only way to reliably do this is to track how many bytes are available for reading but I don't quite see anything in the socket API that enables me to do this.

  • There is nothing in the socket API that will help you. – Stephen C Oct 02 '22 at 01:48
  • If you read RFC 2616 (and other HTTP RFCs) more carefully, it explains what you need to do. You have to follow the message format and processing rules that are defined. You don't need to know how many bytes are on the socket in order to do that. – Remy Lebeau Oct 02 '22 at 06:06

4 Answers4

1

Reading an HTTP/1.1 request is a two step operation:

  1. read the header fields
  2. use the information obtained by reading the header fields in order to read the body

The format of the request is covered by RFC 9112, 2.1 Message format:

  HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

start-line refers to the line including the method, url and protocol version, then a CRLF (\r\n) and then optional field-lines. You're done reading the headers fields when you read two successive CRLF.

Reading the headers, you'll know whether a body is present, and how it's framed (meaning how to read it). There are three possibilities:

  1. a content-length: header field tells you exactly how many bytes to read after the last CRLF
  2. a transfer-encoding: header field tells you how the body is encoded, and which method to use to read it. The one method used in practice is chunked. See section 7.1 of RFC 9112 for a description of that format.
  3. neither header fields are present, meaning that there is no body associated with the request (note that this only applies to requests, it's different for responses -- see section 6. for more details)

Finally, you'll have noticed that I've used RFC 9112, not RFC 2616. That's because 9112 is part of the series of RFCs that have obsoleted 2616. See this blog post and this one for more details.

Frederik Deweerdt
  • 4,943
  • 2
  • 29
  • 31
0

A socket cannot know how many bytes will be available, but RFC2616 offers a solution. As you can read in section 8.1.2.1:

In order to remain persistent, all messages on the connection MUST have a self-defined message length (i.e., one not defined by closure of the connection), as described in section 4.4.

Mihe
  • 2,270
  • 2
  • 4
  • 14
0

Can you determine how many bytes are present in an accepted Java socket?

The short answer is No. There is nothing in the Socket API that will tell you the overall stream length.

Why? Because if you look at the TCP protocol (as a typical example of a stream transport protocol) there is nothing in the protocol that transmits the stream length ... at the start. Indeed, you only know what the TCP stream length when (and if) the receiving end gets a FIN message. (Refer to the TCP spec or Wikipedia for more details.)

This means that if you need to know the number of (file) bytes that will be sent at the start, you need to handle this at the application protocol layer.

  • If you use HTTP/1.x, you will need to deal with the 3 variants that Fredrik describes in his answer. And note that in the 3rd one, you won't be able to get the size ahead of time at all. (Of course, if you control the server side, you can ensure that doesn't happen ...)

    Note: if you are trying to implement this directly at the socket API level, then the onus is on you to read, (correctly) understand and (correctly) implement the subset of HTTP/1.x that you need. Some of the spec is rather complicated. And it gets potentially even more complicated with newer versions of HTTP ... which are increasingly used by browsers, servers, content delivery networks and so on.

    So my advice would be: Don't do it! Use an existing HTTP protocol implementation on both the client and server sides. You will save yourself a lot of time and (mostly) nugatory effort.

  • If you change your mind about using HTTP, you can basically do what you want. But to achieve the goal of getting the file size at the start of a transfer, the sender will need to send it as part of your custom application protocol.


Non-solution:

Once you get an InputStream from a Socket you will see an available() that tells you how many bytes can be read without blocking. That does NOT tell you the length of the entire stream. Indeed it isn't necessarily even an accurate measure of the number of bytes available. It is best to ignore it and use a (constant) fixed sized buffer for reading. A socket InputStream.read(byte[], ...) call will return whatever bytes are currently available (up to the buffer size). It will only block if there are zero bytes currently available ... and the stream hasn't been closed.

(The available() method is pretty much useless. Most valid use-cases can be handled better in other ways; e.g. by using a Selector.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
-1

When you create a java.net socket or a URLConnection class you must get a stream class from one of its methods after doing its settings. Then you obtain the stream class you can use a method available() from a stream class. NB: one or two stream class's of java.io don't have that method but can be cast to an inputstream our outputstream that has available() , if you need wait() from the stream you will need to also wrap in an interrupted exception. For counting http packets and headers and frames see https://docs.oracle.com/javase/7/docs/api/java/net/DatagramSocket.html

Samuel Marchant
  • 331
  • 2
  • 6
  • The `available()` method is not useful in this context. In only tells you how many bytes can be read **right now** without risk of blocking. For a socket stream, that will be the number of unread bytes **currently** sitting in the (local) OS protocol buffers. Not the length of the entire stream. The OP says they want to find the size of the entire HTTP request message ... without assuming that it has been locally buffered (by the OS). – Stephen C Oct 02 '22 at 04:55
  • Then it may be better to evaluate the frames https://docs.oracle.com/javase/7/docs/api/java/net/DatagramSocket.html – Samuel Marchant Oct 02 '22 at 05:18
  • No. That means you need to use a datagram protocol; e.g. UDP. And then you need to deal with flow control, retransmission, etc, in application code. Bad idea. The real issue here is (I think) that you have misconstrued the question. I interpret it as (fundamentally) and X-Y problem. The OP is trying to preallocate a buffer or something to hold the entire file. But you can't do that at the socket level because the information is simply not available at that the transport level (e.g. TCP / Sockets). – Stephen C Oct 02 '22 at 05:44
  • Buffer classes in java can be constructed with append expectancy and have methods for that. – Samuel Marchant Oct 02 '22 at 05:59
  • I don't know what you are talking about Samuel. You can't deal with UDP packet loss and UDP rate control using buffer classes. It requires a second transport protocol built on top of UDP to do that. And it still doesn't directly solve the OP's actual problem. If you want to solve that, you just design the custom *application* protocol to send the file length (or whatever) at the start of the stream. – Stephen C Oct 02 '22 at 08:44
  • ..."You can't deal with UDP packet loss and UDP rate control using buffer classes"... Correct, as a programmer you must write programs using the supplied API , Java buffers usually have some form of append method. – Samuel Marchant Oct 02 '22 at 09:49
  • I know that, but how is it relevant? I was talking in my comments about OS buffers. Buffers that live in kernel space. They aren't Java objects, and probably *don't* have an `append` method ... in the sense that you would understand it. Hence my "I don't know what you are talking about". – Stephen C Oct 02 '22 at 09:52
  • It's relevant because he does not know how many bytes he'll receive, although my memory of post requests detail http headers do generally tell how much is there beyond a point in the command headers. Anyhow, why is he using http for an FTP server, why not read the FTP commands RFC? – Samuel Marchant Oct 02 '22 at 09:58