Read entire message from a TCPSocket without hanging

Question

I'm putting together a TCPServer in Ruby 3.0.2 and I'm finding that I can't seem to read the entire packet without blocking (until the socket is closed).

Edit: There was some confusion on what I was trying to do - my bad - so just to help clarify: I wanted to read everything that had been sent over the TCP connection so far. (end edit)

My first try was:

#!/snap/bin/ruby
require 'socket'

server = TCPServer.new('localhost', 4200)

loop {
  Thread.start(server.accept) do |connection|
    puts connection.gets  # The important line
  end
}

But that hangs until the client closes the connection. Okay, so I take a look at connection.methods, and the ruby docs and try a bunch of options that seem promising. Basically, there is two types of read methods: blocking and nonblocking.

The blocking methods that I tried are .read, .gets, .readlines, .readline, .recv, and .recvmsg. Now .read, .readlines, and .gets all hang (until the socket is closed) - so that's not helpful. The other ones (eg. .readline, the recv methods) don't read the entire message. Now, I could read each line until I see an empty line and parse the HTTP header from there. But there's got to be a better way; I don't want to have to worry about getting a corrupted message and hanging because I didn't read an empty line at the end of the header.

So I went looking at the non-blocking options. Specifically .recv_nonblock and .recvmsg_nonblock. Both of these throw errors (Resource temporarily unavailable - recvfrom(2) would block and Resource temporarily unavailable - recvmsg(2) respectively).

Any ideas on what could be going on? I think it has something to with me using Ruby 3, because trying out the code on Ruby 2.5, client.gets returns a line (doesn't hang), although .readlines does hang - so not sure what's going on.

Ideally, I could just call something along the lines of client.get_message and I would get the entire message that has been sent, but I'd also be okay with working at the TCP level and getting the packet size, reading that size, and reconstructing the message from there.

TCP has no concept of a message. It is just a byte stream. You have to define message semantics on top of this byte stream, for example using a length prefix or an end-of-message marker or so. — Steffen Ullrich, Sep 13 '21 at 03:45
I guess what I'm looking for is a way to read everything that has been sent so far. I'm reading HTTP messages so I can parse the fields and do it that way - just surprised there isn't a straightforward method to read that returns when it would need to hang. — Eric Power, Sep 14 '21 at 00:49
*" I'm reading HTTP messages"* - HTTP has a clearly defined message format. The length if the body is given in the HTTP header via Content-Length or in case of Transfer-Encoding chunked before each body chunk. The header itself has a clear marker where it ends (empty line). Don't wrongly guess how a protocol works and then wonder why it is so hard. Instead look at the actual standard and also use libraries which are created to handle the specific protocol. — Steffen Ullrich, Sep 14 '21 at 04:17
*"Read entire message ... read the entire packet ..."* - You are mixing up concepts and none of these concepts is actually relevant for TCP. A message is not a packet in TCP (it would be in UDP though). The same message can be delivered with multiple packets on the wire, multiple messages can be put into the same packet etc. — Steffen Ullrich, Sep 14 '21 at 04:32

roo · Answer 1 · 2021-09-13T05:36:00.933

TCP just transmits the bytes that you write to the socket, and guarantees that the are received in the order they were sent. If you have the concept of a 'message' then you'll need to add that into your server and client.

.gets specifically will block until it reads a new 'line', or whatever you define as the separator for the string - see the docs IO#gets. This means that until your server receives that byte from the client, it will block.

In your client have a look at how you're writing your data - if you're using ruby then puts would work, as it will terminate the string with a new line. If you're using write then it will only write the string without a new line

Ie.

# client.rb
c = TCPSocket.new 'localhost', 5000
c.puts "foo"
c.write "bar"
c.write "baz\n"

# server.rb
s = TCPServer.new 5000
loop do
  client = s.accept
  puts client.gets
  puts client.gets
end

will output

foo
barbaz

I guess I'm more familiar with TCP at the actual TCP level where it does send in distinct packages & fragments them as needed to pass over the wire. Is there any way to read everything that has been sent so far? Or a clean way to check if there's anything in the buffer? Calling a non_blocking read and rescuing on error would work, just surprised there isn't a straightforward method to do that. — Eric Power, Sep 14 '21 at 00:55

Eric Power · Answer 2 · 2021-09-14T15:55:44.880

0

Thanks to everyone who commented/answered, but I found the solution that I think was intended by the creators of the Socket class!

The recv_nonblock method takes some optional arguments - one of which is a buffer that the Socket will store what it has read to. So a call like client.recv_nonblock(1000, 0, buffer) stores up to 1000 characters from the Socket into buffer and then exits instead of blocking.

Just to make life easy, I put together a monkey patch to the TCPSocket class:

class TCPSocket

  def eat_buffer
    contents = ''
    buffer = ''
    begin
    loop {
      recv_nonblock(256, 0, buffer)
      contents += buffer
    }
    rescue IO::EAGAINWaitReadable
      contents
    end
  end

end

The point that Steffen makes in the comments is well taken - TCP isn't designed to be used this way. This is a hacky (in the bad sense) method, and should be avoided.

edited Sep 14 '21 at 15:55

answered Sep 14 '21 at 02:03

Eric Power

132
6

Yup, I haven't looked into the implementation of `gets` but I'd imagine it would work in a similar way - read bytes until it encounters a character is considers a new line, then return. – roo Sep 14 '21 at 03:12
This code __relies on all data already being available in the local socket buffer__ and just need to be read from it. But, data needed may still by underway or not even sent by the peer - have a look at concept like TCP window for details. So it will return something but especially for larger messages it will not be the full message sent by the peer. So it will __seem to solve your problem but will break in edge cases__, providing nice heisenbugs. Again, TCP has no implicit concept of messages and __one explicit needs to add message framing on top of the byte stream provided by TCP__. – Steffen Ullrich Sep 14 '21 at 04:24

Read entire message from a TCPSocket without hanging

2 Answers2