0

Problem

I have a use case where I need to Peek at exactly the first TCP packet, whatever length it may be.

Snippet

I would have expected this to work:

conn, err := sock.Accept()
if nil != err {
    panic(err)
}

// plenty of time for the first packet to arrive
time.Sleep(2500 * 1000000)

bufConn := bufio.NewReader(conn)
n := bufConn.Buffered()
fmt.Fprintf(os.Stdout, "Size of Buffered Data %d\n", n)

However, even though I am positive that the data has arrived it still shows that 0 bytes are buffered.

Full Test Application

Here's a full test program:

package main

import (
    "bufio"
    "fmt"
    "net"
    "os"
    "strconv"
    "time"
)

func main () {
    addr := ":" + strconv.Itoa(4080)
    sock, err := net.Listen("tcp", addr)
    if nil != err {
        panic(err)
    }
    conn, err := sock.Accept()
    if nil != err {
        panic(err)
    }

    bufConn := bufio.NewReader(conn)
    var n int
    for {
        n = bufConn.Buffered()
        fmt.Fprintf(os.Stdout, "Size of Buffered Data %d\n", n)
        if 0 != n {
            break
        }
        time.Sleep(2500 * 1000000)
    }
    first, err := bufConn.Peek(n)
    if nil != err {
        panic(err)
    }
    fmt.Fprintf(os.Stdout, "[Message] %s\n", first)
}

Testing

And how I've been testing:

telnet localhost 4080

Hello, World!

This works equally well:

echo "Hello, World!" | nc localhost -p 4080

However, if I call Peek(14) directly the data is obviously there.

Why?

I'm dealing with an application-specific use case - magic byte detection when multiplexing multiple protocols over a single port.

In theory packet sizes are unreliable, but in practice a small hello packet of a few bytes will not be made smaller by any routers in the path and the application will not send more data until it receives the handshake response.

The Kicker

I'm supporting exactly one protocol that requires the server to send its hello packet first, which means that if after a wait of 250ms no packet has been received, the server will assume that this special protocol is being used and send its hello.

Hence, it will be best if I can know if data exists in the underlying buffer without doing any Read() or Peek() beforehand.

coolaj86
  • 74,004
  • 20
  • 105
  • 125

2 Answers2

4

I have a use case where I need to Peek at exactly the first TCP packet, whatever length it may be.

TCP is a streaming protocol and not a datagram protocol like UDP. This means packets are irrelevant from the perspective of TCP. They only exist temporarily on the wire.

Any data the application sends will be put into the continuous send buffer and then packetized by the operating system for transport. This means multiple writes by the application can result in a single packet, a single write into multiple packets etc. If data are lost during transport (i.e. no ACK) the senders OS can even do a retransmit with differently sized packets.

Similar packets received on the wire will be reassembled inside the OS kernel and will be put into the continuous read buffer. All packet boundaries which might have existed on the wire will be lost when doing this. Therefore no way exist for the application to find out where the packet boundary was.

    n = bufConn.Buffered()

bufConn is not the OS socket buffer. bufConn.Buffered() will only see the data which are read from the underlying socket into the Go process but which are not yet retrieved by the application logic using bufConn.Read(): if you try to read a single byte with bufConn.Read() it will actually try to read more bytes from the underlying socket, return the single byte you've requested and keep the rest in the bufConn buffer for later reads. This is done to provide a more efficient interface for the application logic. If you don't want this don't use buffered I/O.

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • 2
    ... and therefore sleeps in blocking mode networking code are just literally a waste of time. – user207421 Jul 23 '18 at 05:41
  • @EJP The sleep is my test code is just a way to ensure that data has been received. I understand that I would use channels and select in real code. – coolaj86 Jul 24 '18 at 17:07
  • @Steffen-Ullrich In general what you're describing is true, however, I'm dealing with an application-specific use case. In theory packet sizes are unreliable, but in practice a small hello packet of a few bytes will not be made smaller by any routers in the path and the application will not send more data until it receives the handshake response. The reason I need this functionality is for magic byte detection when multiplexing multiple protocols over a single port. – coolaj86 Jul 24 '18 at 17:26
  • @CoolAJ86: in this case the question is answered by the part which explains that `bufConn.Buffered()` returns only how many data are *buffered* which is not the same as how many data are *able to read*. And if you need it for magic byte detection - why don't you just read the magic byte instead of checking how many data are possible to read? – Steffen Ullrich Jul 24 '18 at 17:28
  • @SteffenUllrich Mostly because I'm used to writing code in Node.js and, except when for very particular circumstances when pausing the network stream, there's a data event for each tcp packet. Aside from that, different protocols have different handshake styles and different "magic bytes" (SSH, TLS-SNI, HTTP, PROXY, proprietary protocols, etc) . In some cases the client sends the first packet, in other cases the server does (in which case only 1 such protocol could be supported). In some cases the magic bytes are the first 3 bytes, in other cases they require reading a length. – coolaj86 Jul 24 '18 at 17:39
  • @SteffenUllrich In almost all cases all of the necessary information is within the first tcp packet, but the size of the packet is unknown. – coolaj86 Jul 24 '18 at 17:39
  • *"... there's a data event for each tcp packet."* - it's not. There is instead an event if data are available for read. These data might have actually come in multiple packets, especially if the packets arrived fast or if the previous processing was slow and multiple packets arrived while the previous input was still processed. – Steffen Ullrich Jul 24 '18 at 17:44
0

Update: Can't be done with net.Conn

Actually, it is not possible to "Peek" at a net.Conn without reading. However net.Conn can be wrapped and that wrapper can be passed around anywhere net.Conn is accepted.

See

Workable Half-Solution

The ideal solution would be to be able to Peek immediately on the first try. While searching around I did find some custom go TCP libraries... but I'm not feeling adventurous enough to try that yet.

Building off of what @SteffenUllrich said, it turns out that buffConn.Peek(1) will cause the buffer to be filled with the available data. After that buffConn.Buffered() returns the expected number of bytes and it's possible to proceed with buffConn.Peek(n):

// Cause the bufConn with the available data
firstByte, err = bufConn.Peek(1)
if nil != err {
    panic(err)
}

// Check the size now
n = bufConn.Buffered()
fmt.Fprintf(os.Stdout, "Size of Buffered Data %d\n", n)

// Peek the full amount of available data
firstPacket, err = bufConn.Peek(n)
if nil != err {
    panic(err)
}

I thought I had tried this earlier and saw the buffer only filled with 1 byte, but reading the answer above caused me to create a specific test case to be sure, and it worked.

The Downside

This still requires a Read()/Peek() before knowing the size of the data.

This means that for my particular case where a single protocol is supported which requires the server to send the first hello packet, I have to store state about the connection somewhere else such that if enough time has passed (say 250ms) without any data being received I know to now skip detection of the first packet Peek when it comes in.

coolaj86
  • 74,004
  • 20
  • 105
  • 125
  • After the `Peek` there might still be data in the operating systems socket buffer. `Peek` will not read more data than fit into the buffer of the `bufConn` object. This might not be relevant in your specific use case with small data though. – Steffen Ullrich Jul 24 '18 at 17:53