2

I'm writing a client server socket program in c on a ubuntu linux box. the server side needs to handle many connections and both the server and client have a local socket to send received data to a local process after some manipulating it and the number of sent and received data are huge. (the size of data is not very huge, maximum 1500) here is the diagram:
[local process of client] <-> data <-> client <---------> server <-> data <-> [local process of server]

so all the sockets (client_local_socket, client_remote_socket, server_remote_socket,server_local_socket) need to be nonblock.

when I run the client and server in two computers in lan network, it works grate, but when move the server program to a linux server in internet (client connects to the server behind a nat) client starts communicating successfully with the server (both client and server get some EAGAIN error, but recover it after the next tries and as I know its pretty normal for nonblock) but after a while (more than 1000 send and receive packets), the client_remote_socket fails in writing with error code EAGAIN and can not recover it in the next tries and after that, it always gets this damn EAGAIN for writing. BTW client_remote_socket has no problem in reading and allways get packets from server. the server has no problem at all and client_local_socket works grate in both writing and reading.

I've used this code to make sockets nonblock:

int flags;
if ((flags = fcntl(client_remote_socket, F_GETFL, 0)) < 0)
    flags = 0;
flags = flags | O_NONBLOCK;
fcntl(client_remote_socket, F_SETFL, flags);

I also have tried it with:

fcntl(client_remote_socket, F_SETFL, O_NONBLOCK);

but the results is the same.

the only setsockopt that I've used is SO_REUSEADDR in the server side and client has no setsockopt.

its good to mention that I always check the value that write returns and when it is <0 I check the the errno and see its EAGAIN. As I know, write returns EAGAIN, when the kernel has no available space for the write buffer and it does not make any sense that kernel has no memory for me in laptop with a 4 GB ram. and BTW it works grate when I run both client and server in a lan network. when this hapens in client, server does not show any sign of broken client socket, and its right, because in the meantime, it can receive data from the server. I double checked the code again and again and tried to debug it many times and could not see anything wrong. I also used the select system call to check if the socket is available for writing and it always returns 0 when the time comes. now I have no clue to solve this and any ideas would be very grate for me. thanks.

madz
  • 1,803
  • 18
  • 45
  • I don't see that you have much choice but to post the relevant code. – Duck Nov 03 '13 at 15:41
  • A socket arn't allowed to use all your memory, the socket buffers have a limit, typically up to a few MB. Getting EAGAIN is 100% normal, it usually means that either the network can't handle more traffic, or the receiving application isn't receiving fast enough. – nos Nov 03 '13 at 16:01
  • 1
    The thing that tickles my spidey sense is that in the first paragraph you say "the server side needs to handle many connections" and in the last that "I also used the select system call", apparently as a debugging mechanism rather than as a requisite piece of handling many connections. If you are just in a loop pounding away at `send` getting a whole lot of EGAINs is not surprising. – Duck Nov 03 '13 at 16:05
  • thank you guys for helping me.@Duck right now, I'm the only client connecting to server and real time logs shows me the poor server is waiting for connection and this is the client that can not send the packets.getting bunch of EGAIN is not surprising me either, but the problem is, it never sends any packet for couple of minutes. I've even tested it for 15 min and still just EGAIN :| the select that I used, returns just 0. – madz Nov 03 '13 at 17:23
  • @nos my socket buffer is only 2000 bytes and it always uses just up to 1600 or 1700. so I'm not violating linux write buffer size. the network monitoring shows there is no bandwidth exceed. the packets size is small. – madz Nov 03 '13 at 17:37

1 Answers1

2

I have gotten the same problem last week and after doing a research I found that it's because the peer's buffer is full. I tested this case.

When the remote buffer is full, it tells your local stack to stop sending. When data is cleared from the remote buffer (by being read by the remote application) then the remote system will inform the local system to send more data.

This is answer of Brian White https://stackoverflow.com/a/14244450/3728361

Community
  • 1
  • 1
adcheese
  • 21
  • 3
  • Well thank you adcheese for the answer. your totally right about that behaviour in tcp and I'm sure the answer will help somebody. But in my case it was the isp's router fault in the path to internet due to tcp black hole. I already posted an answer about that but sounds did not fill up the rules of stackoverflow and have been deleted – madz Jul 04 '14 at 13:04