4

I've started to learn about Network programming in C (sockets), and Internet protocols.

Currently I'm focusing on HTTP.

During my (very) short experience with implementing HTTP client using C, I discovered that sometimes you have to call recv() multiple times in order to get the entire message from the server - even though the client didn't send a message to the server between those seperate calls to recv().

For example:

When I was trying to implement a HTTP client with one call of recv(), I got just the headers (I think, I'm still a newbie with HTTP). But when I called recv() twice, I got the headers in the first call, and the body (the html code) in the second call.

It is not a problem of a short buffer, because the buffer I was using was long enough to hold the entire message.

Why does it happen? What is the reason that the client has to call recv() multiple times, although there is no new data that was sent by the client? I thought that if the client doesn't send new data to the server, the call to recv() will bring the entire response of the server.

I don't think this is a problem with my code, but if you are going to ask for the code, I have no problem to post it here. I just think this is unnecessary, correct me if I was wrong.

I don't think this is relevant, but I'm using Winsock2.

Programmer
  • 750
  • 2
  • 9
  • 17
  • Raw I/O functios must generally *always* be called in a loop, since they do not make any success guarantees. – Kerrek SB Jan 25 '14 at 17:18
  • But the first call succeeded - I got the headers in the first call. In addition, what is a "raw function"? Thanks. – Programmer Jan 25 '14 at 17:20
  • 1
    But it "succeeds" without any guarantees about how much it read. It can read as much as one single byte and "succeed". So you have to keep calling until it says "no more data, connection closed". By "Raw" I mean operating services like `read` or `recv`, or `write`. By contrast, high-level library functions like `printf` or `fwrite` only need to be called once, since they promise to write the entire set of data (they'll do the looping internally). – Kerrek SB Jan 25 '14 at 17:26
  • Thanks for the explantion about "raw" functions. How can I know that all the data was recieved? Checking if recv() returned 0 is ok? – Programmer Jan 25 '14 at 17:27
  • 1
    Closely read the man-pages for `recv()`/`send()` and learn that at least for sockets those two functions do not necessarily receive/send as much bytes as they were told to, but few. So looping around such calls counting until all data or a terminator had been received/sent is a good idea, not to say an essential necessity. – alk Jan 25 '14 at 17:30
  • Read the manuals very carefully. Generally, the functions return how much they read/wrote, so you have to accumulate that yourself to determine whether the overall operation succeeded; you also have to handle closed connections and errors. – Kerrek SB Jan 25 '14 at 17:30

3 Answers3

8

TCP was designed to be stream oriented. The receiver doesn't know anything about sent messages/packets. All it sees is a simple stream of bytes.

So you think like "I sent a message", but recv() only received half of it. The fact is, you never sent a message. You sent a bunch of bytes, and you cannot expect all of them to be received by a single call. You could receive them in 1, 2 or many calls; you could receive the end of a previous "message" combined with the beginning of the next one and so on.

The only guarantees are that you will receive the bytes in the same order they were sent and you will never receive 0 bytes until the stream is closed. That's how the network API works and you have to get used to it.

Gabe
  • 84,912
  • 12
  • 139
  • 238
Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
  • I still don't get it. What do you mean by saying 'TCP is a stream'? And how it is related to my question? Thanks for the help. – Programmer Jan 25 '14 at 17:19
  • 2
    Like a stream, if you want all the contents you have to keep reading until you get an end-of-stream indication. There is no promise about how data will be divided up across those reads; that's purely an artifact of how the code was written and what size buffer it's using. – keshlam Jan 25 '14 at 17:24
  • Well, it is more clear right now. Another related question: In my case, the first call got the 'headers', the second call got the 'body' - the html code. Does it mean that the server "called to send()" twice: the first call with the header, and the second call with the body(html)? – Programmer Jan 25 '14 at 17:25
  • the receiver doesn't know *shit* about how many sends() the server did. so you cannot base any conclusions on it. It's just a stream of bytes. The are no markers, chunks, messages.... – Karoly Horvath Jan 25 '14 at 17:27
  • Isn't it weird that I got exactly the headers and the body, in two different calls? This isn't a miracle, you know. – Programmer Jan 25 '14 at 17:28
  • 1
    The server might did two sends(). The point is: it's none of your business. – Karoly Horvath Jan 25 '14 at 17:29
  • Also, tcp packets generally have a maximum size of about 1500 bytes. Recv will return as soon as it gets one of them, but that may another be the entirety of what you sent. – Max Jan 25 '14 at 18:00
2

The receiver does not know anything how much send calls and how much bytes in each call
are made at the sender. You can´t say that you want a recv with 10 byte
for each send with 10 byte.
A recv call could receive the data of three sends,
or the data of one send could be split over three receives...
If they match sometimes, it is pure coincidence.

deviantfan
  • 11,268
  • 3
  • 32
  • 49
  • In my case, the first call got the 'headers', the second call got the 'body' - the html code. Does it mean that the server "called to send()" twice: the first call with the header, and the second call with the body(html)? – Programmer Jan 25 '14 at 17:24
  • 1
    @ Programmer Probably yes, but it is not sure. The only think you can be sure is that the order of bytes you receive is the same as the server has sent. – Marian Jan 25 '14 at 17:26
  • 1
    @Programmer: That could be the case, but there is no guarantee. As i said, the number of send and recv calls does not have no match. All send´s data are concatenated together at some point, and then again splitted at the receiver how the OS thinks it likes it. This is only a decision of the clients OS etc., not influenced by the sending part of the connection – deviantfan Jan 25 '14 at 17:27
0

One thing you can do is to use a loop like this.

    while((recv(socketi,response,sizeof(response)-1,0))>0){
    }
    printf("\n%s",response);

In the request you had send, you had to specifically say that you are using HTTP/1.0 GET / HTTP/1.0 (to close the connection after you receive all the data). Otherwise it will wait until a timeout occurs.

By doing so you will get all the response in the array. No need to call recv() multiple times.

Ananthan
  • 1
  • 2