3

I am reading through the documentation examples for python socketserver at https://docs.python.org/2/library/socketserver.html

Why is the size specified as 1024 in the line self.request.recv(1024) inside handle method. What happens if the data sent by the client is more than 1024 bytes ? Is it better to have a loop to read 1024 bytes until socket is empty ? I have copied the example here :

import SocketServer

class MyTCPHandler(SocketServer.BaseRequestHandler):
    """
    The RequestHandler class for our server.

    It is instantiated once per connection to the server, and must
    override the handle() method to implement communication to the
    client.
    """

    def handle(self):
        # self.request is the TCP socket connected to the client
        self.data = self.request.recv(1024).strip() # why only 1024 bytes ?
        print "{} wrote:".format(self.client_address[0])
        print self.data
        # just send back the same data, but upper-cased
        self.request.sendall(self.data.upper())

if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    # Create the server, binding to localhost on port 9999
    server = SocketServer.TCPServer((HOST, PORT), MyTCPHandler)

    # Activate the server; this will keep running until you
    # interrupt the program with Ctrl-C
    server.serve_forever()
Bunny Rabbit
  • 8,213
  • 16
  • 66
  • 106

2 Answers2

5

When reading from a socket it's always required to make a loop.

The reason is that even if the source sent say 300 bytes over the network it's possible for example that the data will arrive to the receiver as two separate chunks of 200 bytes and 100 bytes.

For this reason when you specify a buffer size for recv you only say the maximum amount you're willing to process, but the actual data amount returned may be smaller.

There is no way to implement a "read until the end of the message" at the Python level because the send/recv functions are simply wrappers of the TCP socket interface and that is a stream interface, without message boundaries (so there is no way to know if "all" the data has been received from the source).

This also means that in many cases you will need to add your own boundaries if you need to talk using messages (or you will need to use an higher-level message-based network transport interface like 0MQ)

Note that "blocking mode" - when reading from a socket - only defines the behavior when there is no data already received by the network layer of the operating system: in that case, when blocking - the program will wait for a chunk of data; if non-blocking instead - it will return immediately without waiting. If there is any data already received by the computer, then the recv call immediately returns even if the passed buffer size is bigger - independently of the blocking/non-blocking setting.

Blocking mode doesn't mean that the recv call will wait for the buffer to be filled.

NOTE: The Python documentation is indeed misleading on the behavior of recv and hopefully will be fixed soon.

boardrider
  • 5,882
  • 7
  • 49
  • 86
6502
  • 112,025
  • 15
  • 165
  • 265
  • Is this assuming non-blocking mode? The Python `socket` module doc states *Initially all sockets are in blocking mode* ... *in blocking mode, the calls block until they can proceed*. – cdarke Apr 23 '15 at 06:30
  • @cdarke: a `recv` call can proceed when there is **any amount of data**, not when the buffer is filled. It's of course trivial to implement a `recvall` (with a mandatory size parameter) by making a loop. – 6502 Apr 23 '15 at 06:37
  • 1
    @cdarke so in this particular example only the first 1024 (or lesser) bytes are read, and if the data sent by the client was large its ignored ? – Bunny Rabbit Apr 23 '15 at 06:50
  • @BunnyRabbit: no... extra data when using a TCP stream connection will remain in the OS buffer for the next call to `recv` (that will succeed instantly). That example is a toy (broken) server example that is not handling the case the client sends more than 1024 bytes or that data ends up being fragmented. The `sendall` call makes a loop for sending, but for receiving instead you're required to write a loop yourself and to define a protocol to establish when a message is complete (e.g. prepending a size or considering line terminators). – 6502 Apr 23 '15 at 06:59
0

A TCP socket is just a stream of bytes. Think of it like reading a file. Is it better to read a file in 1024-byte chunks? It depends on the content. Often a socket, like a file, is buffered and only complete items (lines, records, whatever is appropriate) are extracted. It's up to the implementer.

In this case, a maximum of 1024 is read. If a larger amount is sent, it will be broken up. Since there is no defined message boundary in this code, it really doesn't matter. If you care to receive only complete lines, implement a loop to read data until a message boundary is determined. Perhaps read until a carriage return is detected and process a complete line of text.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • If a larger amount is sent, it will be broken up and in this case only the first 1024 bytes will be read, because there's no loop to read more data ? – Bunny Rabbit Apr 23 '15 at 06:45