1

the pid of the process is 1996291.

there are 65534 fds in /proc/1996291/fd, most of the fds are sockets, like this:

lrwx------ 1 root root 64 Dec 30 13:59 10000 -> socket:[952574733]
lrwx------ 1 root root 64 Dec 30 13:59 10001 -> socket:[952566188]

I know that the number in bracket is inode of the socket. There should be one same inode in /proc/net/tcp for every socket. However, some inode can be found, but some can't:

cat /proc/net/tcp | grep 952574733

If I found the inode, the output like follows:

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
 336: 4114C80A:271A 1914C80A:0CEA 01 00000000:00000000 02:0000BE1B 00000000     0        0 962759319 2 ffff88035a20cb00 20 4 30 10 16

This is a real connection.

I use netstat -tnp to show connections and get a great many TIME_WAIT connections. I don't know whether they have relationship with my problem.

I use lsof -p 1996291, the output is like this, a great many sockets:

app    1996291 root *520u     sock       0,8      0t0 953021420 protocol: TCP
app    1996291 root *521u     sock       0,8      0t0 953027193 protocol: TCP
app    1996291 root *522u     sock       0,8      0t0 953021422 protocol: TCP
app    1996291 root *523u     sock       0,8      0t0 953038715 protocol: TCP

There three kernal options have been set to 1:

net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_recycle
net.ipv4.tcp_syncookies

I can't solve these problem for several days, anyone can help me?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
lutaoact
  • 4,149
  • 6
  • 29
  • 41
  • What is the question exactly? Are you wondering why you have so many connections still open? We can't answer that without some code. For a server, you may also want to dump the list of goroutines ([net/http/pprof](https://golang.org/pkg/net/http/pprof/) would help). – Marc Jan 01 '18 at 15:19
  • 3
    This type of error can occur when your program is making http requests without closing the body on the returned responses. Read the docs here: https://golang.org/pkg/net/http/#Response.Body. Specifically this sentence: "It is the caller's responsibility to close Body." – mkopriva Jan 01 '18 at 15:33
  • @mkopriva I restart the process and all the open fds disappeared. Let me watch it this week to confirm it can occur again. – lutaoact Jan 03 '18 at 05:50
  • @lutaoact: yes, when a process exits all FDs are closed. Open socket are often tied to a goroutine, have you tried looking at a stack trace to see what's blocked in Read or Write? – JimB Jan 03 '18 at 19:06
  • @JimB how to see the stack trace? I am newbie in golang. – lutaoact Jan 04 '18 at 02:47
  • @lutaoact: send the process a SIGQUIT. – JimB Jan 05 '18 at 14:49

1 Answers1

0

For each socket on your machine there is a file descriptor. When you have too many open connections there will be too many files open and it will crash.

You can try to prevent this by limiting your amount of open connections at the same time or by properly closing the fd's by closing the body of your returned responses. Quickly recycling sockets may also help.

Another hacky approach would be to up the limit of open files with:

ulimit -n [new limit]