4

I have following issue: here is the chunk of code:

void get_all_buf(int sock, std::string & inStr) {
    int n = 1;
    char c;
    char temp[1024*1024]; 

    bzero(temp, sizeof(temp));

    n = recv(sock, temp, sizeof(temp), 0);

    inStr = temp;
};

but sometimes recv returning not whole data (data length always less then sizeof(temp)), only its part. Write side always sends me whole data (I got it with sniffer). What matter? Thx.

P.S. I know, good manner suggests me to check n (if (n < 0) perror ("error while receiving data")), but it doesn't matter now - it's not reason of my problem.

P.S.2 I've forgot - it's blocking socket.

Delimitry
  • 2,987
  • 4
  • 30
  • 39
milo
  • 1,220
  • 3
  • 17
  • 33
  • 1
    Good manner also suggest to check your inputs. if there is no \0 in what you are receiving, then at best your program may crash, at worse you can get an crafted invalid string which exploits the program and pwn the system for fun and profit. – BatchyX Dec 22 '10 at 13:59
  • 1
    `std::string`? Then this is a C++ question, not a C one. – Karl Knechtel Dec 22 '10 at 15:59

3 Answers3

11

The TCP standard allows for fragmentation of data packets. In practice this doesn't happen with small data packets of a few hundred bytes or so, but a megabyte of data is almost certain to get fragmented.

Secondly, when you say the sniffer says all the data gets sent, in one packet or in many?

Good network programming practice requires you to not assume that messages arrive in singular chunks. Two sequential messages can arrive as one packet (in theory but almost never in practice) and even if they arrive in multiple packets can be read as a single read. One message can get fragmented into multiple packets and they might not all arrive at once which is probably what you are seeing.

Your program should buffer all its reads and have a mechanism to determine when a whole message has arrived, either via a delimiter (e.g. HTTP headers which are delimited with CRLFCRLF) or by a byte count (e.g. HTTP bodies where the length is specified in the header) or by closing the connection to indicate the end of the data (e.g. HTTP bodies when the content length isn't specified in the header). There may be other mechanisms too.

AlastairG
  • 4,119
  • 5
  • 26
  • 41
  • i've received about 30 kb data successfully. it was fragmented. but sometimes i cannot receive 7-8 kbytes of data. – milo Dec 22 '10 at 13:28
  • 1
    It will depend on how the sending side fragments the data, how busy the network is, how the packets arrive, and probably several other variables. I have updated my answer with suggestions as to how to cope with it, but just a brief outline. I suggest you search the internet for articles on network programming and study them. It's not that difficult, but there are a lot of things to consider. How you write your program depends very much on what you are doing. Most articles on socket programming give bad examples of very noddy applications and the code is little use in real life. – AlastairG Dec 22 '10 at 13:33
  • @milo since you know the size of structure you are expecting, keep calling recv until you have read that many bytes. `for ( int total (0); total < sizeof ( temp ); ) { int n = recv ( sock, temp + total, sizeof ( temp ) - total, 0 ); if ( n < 0 ) abort(); total += n; }` – Pete Kirkham Dec 22 '10 at 13:37
  • i understand what are you talking about, but it's important to me to know how to do it properly. because i saw in books examples of use `recv`, it looks like this: `n = read(newsockfd,buffer,255); if (n < 0) error("ERROR reading from socket");` so... does it correct? – milo Dec 22 '10 at 13:38
  • 2
    TCP has nothing to do with this. The problems lies in the recv() function. recv can return as many bytes as it wants, because of signals and such. – BatchyX Dec 22 '10 at 13:39
  • @Pete Kirkham: Calling recv() in a loop is exactly proposed in my answer. – Juraj Blaho Dec 22 '10 at 13:39
  • thnx, i will be waiting for end of packet signature. but it's interesting to me to know other opinions, if anybody does have - you are welcome :) – milo Dec 22 '10 at 13:43
  • @Juraj you hadn't answered when I wrote the comment – Pete Kirkham Dec 22 '10 at 13:47
  • @BatchyX: That is true (although usually recv returns -1 with errno set to EINTR if interrupted by a signal), but in the vast majority of cases, like 99.9999% of the time or more, the problem is caused by fragmentation. – AlastairG Dec 22 '10 at 13:49
  • @Juraj, actually your answer is not the same as Pete's. Pete is suggesting reading until an expected amount of data is read. Your solution will read until the socket is closed (or at least shutdown for writing on the far end). – AlastairG Dec 22 '10 at 13:51
  • If fragmentation is your only problem, use the posix MSG_WAITALL flag. but fragmentation is not your only problem, so you have to loop-and-check anyway. – BatchyX Dec 22 '10 at 13:51
  • @BatchyX: true, but since signals do prevent effective use of MSG_WAITALL, it probably shouldn't be used except in very particular circumstances. If it is not being used, then, as I said earlier, in almost every case recv() returns a short count due to fragmentation rather than to signals. A short count from recv() can also be due to the size of the receive buffer, but again this is rarely the case so I didn't mention it. Was it you downvoted me? Thanks, because it encouraged someone to up vote me so I gained 8 points :) – AlastairG Dec 22 '10 at 14:19
6

A much better way is to use following:

void get_all_buf(int sock, std::string & output) {
    char buffer[1024];

    int n;
    while((errno = 0, (n = recv(sock, buffer, sizeof(buffer), 0))>0) || 
          errno == EINTR)
    {
        if(n>0)
            output.append(buffer, n);
    } 

    if(n < 0){
        /* handle error - for example throw an exception*/
    }
};

Also note that the buffer allocated on the stack is much smaller. Having 1M buffer on stack may cause stack overflow.

Additional note: You probably don't want to read until the socket is closed, so you may need to add another termination condition to the while loop.

Juraj Blaho
  • 13,301
  • 7
  • 50
  • 96
3

TCP works as a layer on top of other layers: IP and Ethernet. IP allows data fragmentation, and Ethernet allows some data to get lost over the wire. That leads to data loss, and it's reflected on your calls to recv.

When you call recv, the underlaying operating system will try to read as much data as it can up to the size you specified, but might return the call having read less bytes, even one single byte.

You need to create some protocol of your own to keep reading data up to finishing your data piece.

For example, you can use "\n" as a delimiter. This code can be improved, but I hope will get you the idea:

void get_all_buf(int sock, std::string & inStr) {
    int n = 1, total = 0, found = 0;
    char c;
    char temp[1024*1024]; 

    // Keep reading up to a '\n'

    while (!found) {
        n = recv(sock, &temp[total], sizeof(temp) - total - 1, 0);
        if (n == -1) {
            /* Error, check 'errno' for more details */
            break;
        }
        total += n;
        temp[total] = '\0';
        found = (strchr(temp, '\n') != 0);
    }

    inStr = temp;
}
vz0
  • 32,345
  • 7
  • 44
  • 77