0

I have an "agent" that parsed binary files into a buffer and whenever that buffer is filled, sends it to the server via a protobuf message, then proceeds to the next chunk of binary parsing, then send again, etc.

On the server I use the simple net/conn package that listens for agent connection and reads from it into a buffer in a while-for loop. When the parsing is finished agent side, it sends a terminate bool in the protobuf message signalling that it's the last message and the server can proceed with the full data received.

However, this works fine if I leave the debug prints on my sender-side, making terminal printing slow down significantly the interval in which consequent protobuf messages are sent via connection.Write().

If I uncomment this logger, then it sends messages too fast and the server's first processed incoming message is the one packet containing the terminate flag, Eg, it recieved no actual payload, but the LAST message immediately.

I know TCP does not really make a distinction between different []byte packets, which can be the cause of this behavior most likely. Is there a better method of doing this, any alternatives?

Pseudo-code agent side:

    buffer := make([]byte, 1024)
    for {
        n, ioErr := reader.Read(buffer)
        if ioErr == io.EOF {
            isPayloadFinal = true

            // Create protobuf message
            terminalMessage, err := CreateMessage_FilePackage(
                2234,
                protobuf.MessageType_PACKAGE,
                make([]byte, 1),
                isPayloadFinal,
            )
            // Send terminate message
            sendProtoBufMessage(connection, terminalMessage)
            break
        }
        // Create regular protobuf message
        message, err := CreateMessage_FilePackage(
            2234,
            protobuf.MessageType_PACKAGE,
            (buffer)[:n],
            isPayloadFinal)
        sendProtoBufMessage(connection, message)
   }

Pseudo-code server side:

    buffer := make([]byte, 2048)
    //var protoMessage protoBufMessage

    for artifactReceived != true {
        connection.SetReadDeadline(time.Now().Add(timeoutDuration))
        n, _ := connection.Read(buffer)
        decodedMessage := &protobuf.FileMessage{}
        if err := proto.Unmarshal(buffer[:n], decodedMessage); err != nil {
            log.Err(err).Msg("Error during unmarshalling")
        }

        if isPackageFinal := decodedMessage.GetIsTerminated(); isPackageFinal == true {
            artifactReceived = true
            log.Info().Msg("Artifact fully received")
            /* Do stuff here */
            break
        }
        // Handle partially arrived bytestream
        handleProtoPackage(packageMessage, artifactPath)
        } else {
            fmt.Println("INVALID PROTOBUF MESSAGE")
        }
    }

And the proto file for reference:

message FilePackage{
    int32 id = 1;
    MessageType msgType = 2;
    bytes payload = 3;
    bool isTerminated = 4;

}
VikingPingvin
  • 382
  • 5
  • 16

1 Answers1

4

The most likely cause seems to be, as you say, "TCP does not really make a distinction between different []byte packets" (a TCP stream has no message boundaries). When you call connection.Read(buffer) (I'm assuming that connection is a net.Conn) it will block until some data is available (or the read deadline is reached) and then return that data (up to the buffer size). The data returned may be a message (as you have seen in your testing) but could also be a partial message, or, multiple messages (timing and network stack dependent; you should not make any assumptions).

The protobuf docs provide a suggested technique:

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer.

If you take this approach then you can use io.ReadFull when receiving the data (because you will know how many bytes to expect for the size and then use that to receive the data packet).

Brits
  • 14,829
  • 2
  • 18
  • 31
  • Thank you for the answer. What is the most efficient way to compute the size of a protobuff message? I tried to use gob.NewEncoder(b).Encode(v) followed by len() but its super slow! – Pasindu Tennage Aug 03 '21 at 10:30
  • 1
    @PasinduTennage there are quite a few differences between [gob](https://pkg.go.dev/encoding/gob) and protobuf ([benchmarks here](https://github.com/alecthomas/go_serialization_benchmarks)). As the protobuf message format is defined in advance the sender should be able to calculate the size fairly quickly. Probably best to ask this in a new question with more detail re your requirements. – Brits Aug 03 '21 at 10:51
  • Thanks for the suggestion. I made a new question in the following link. https://stackoverflow.com/questions/68635618/golang-protobuf-how-to-send-multiple-messages-using-the-same-tcp-connection – Pasindu Tennage Aug 03 '21 at 11:59