This question is similar to Network port open, but no process attached? and netstat shows a listening port with no pid but lsof does not. But the answers to them can't solve mine, since it is so weird.
I have a server application called lps
that waits for tcp connections on port 8588.
[root@centos63 lcms]# netstat -lnp | grep 8588
tcp 0 0 0.0.0.0:8588 0.0.0.0:* LISTEN 6971/lps
As you can see, nothing is wrong with the listening socket, but when I connect some thousand test clients(written by another colleague) to the server, whether it's 2000, 3000, or 4000. There have always been 5 clients(which are also random) that connect and send login request to the server, but cannot receive any response. Take 3000 clients as an example. This is what the netstat
command gives:
[root@centos63 lcms]# netstat -nap | grep 8588 | grep ES | wc -l
3000
And this is lsof
command output:
[root@centos63 lcms]# lsof -i:8588 | grep ES | wc -l
2995
That 5 connections are here:
[root@centos63 lcms]# netstat -nap | grep 8588 | grep -v 'lps'
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52658 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52692 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52719 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52721 ESTABLISHED -
tcp 92660 0 192.168.0.235:8588 192.168.0.241:52705 ESTABLISHED -
The 5 above shows that they are connected to the server on port 8588 but no program attached. And the second column(which is RECV-Q
) keeps increasing as the clients are sending the request.
The links above say something about NFS mount and RPC. As for RPC, I used the command rcpinfo -p
and the result has nothing to do with port 8588. And NFS mount, nfssta
output says Error: No Client Stats (/proc/net/rpc/nfs: No such file or directory).
Question : How can this happen? Always 5 and also not from the same 5 clients. I don't think it's port conflict as the other clients are also connected to the same server IP and port and they are all properly handled by the server.
Note: I'm using Linux epoll
to accept client requests. I also write debug code in my program and record every socket(along with the clients' information) that accept
returns but cannot find the 5 connections. This is uname -a
output:
Linux centos63 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Thanks for your kind help! I'm really confused.
Update 2013-06-08:
After upgrading the system to CentOS 6.4, the same problem occurs. Finally I returned to epoll
, and found this page saying that set listen fd to be non-blocking and accept
till EAGAIN
or EWOULDBLOCK
error returns. And yes, it works. No more connections are pending. But why is that? The Unix Network Programming Volume 1 says
accept is called by a TCP server to return the next completed connection from the
front of the completed connection queue. If the completed connection queue is empty,
the process is put to sleep (assuming the default of a blocking socket).
So if there are still some completed connections in the queue, why the process is put to sleep?
Update 2013-7-1:
I use EPOLLET
when adding the listening socket, so I can't accept all if not keeping accept till EAGAIN
encountered. I just realized this problem. My fault. Remember: always read
or accept
till EAGAIN
comes out if using EPOLLET
, even if it is listening socket. Thanks again to Matthew for proving me with a testing program.