0

While runing my go program, i find out so many error such as "too many open files" in the logs, and i just to find out which process run out of the fds, and i run this command:

lsof -n |awk '{print $2}'|sort|uniq -c |sort -nr

it returns the result such as

279605 20341 62748 19861 10310 19712 5434 21318 3484 27344 2842 19781 2400 20372 2346 24153 2123 5214 1540 21123

process which pid is 20341 is a mongod process, and i'm surprised about that. So i try another way:

lsof -p 20341 | wc -l

but something make me trouble is that it's result is:567.

After that, i try another way:ll /proc/20341/fd | wc -l which result is 496

And i am so confusion now,which one is right,and what the different between them?

thanks.


updated at:2018-05-31 10:35:33

  • Get the mongodb PID [root@node26 10:34:54 ~]$ps aux | grep mongo mongodb 20341 2.4 1.9 25419812 1257420 ? Sl May28 107:58 /usr/bin/mongod --quiet -f /etc/mongod.conf run

  • Command lsof -p [root@node26 10:36:12 ~]$lsof -p 20341 | wc -l 570

  • Directory [root@node26 10:36:33 ~]$ll /proc/20341/fd/ | wc -l 499

  • Command lsof + grep [root@node26 10:37:33 ~]$lsof | grep 20341 | wc -l 282223

    • front 10 mongod 20341 mongodb cwd DIR 9,127 4096 2 / mongod 20341 mongodb rtd DIR 9,127 4096 2 / mongod 20341 mongodb txt REG 9,127 12238320 2499177 /usr/bin/mongod mongod 20341 mongodb mem REG 9,127 67108864 1969114 /var/lib/mongodb/a_dev.0 mongod 20341 mongodb mem REG 9,127 536870912 1968852 /var/lib/mongodb/a_dev.ns mongod 20341 mongodb mem REG 9,127 67108864 1968447 /var/lib/mongodb/a.0 mongod 20341 mongodb mem REG 9,127 536870912 1968347 /var/lib/mongodb/a.ns mongod 20341 mongodb mem REG 9,127 67108864 1968453 /var/lib/mongodb/b.0 mongod 20341 mongodb mem REG 9,127 536870912 1968449 /var/lib/mongodb/b.ns mongod 20341 mongodb mem REG 9,127 67108864 1968590 /var/lib/mongodb/c.0
    • middle 10 mongod 20341 27018 mongodb 490u IPv4 143223380 0t0 TCP node26:27017->node24:59172 (ESTABLISHED) mongod 20341 27018 mongodb 491u IPv4 143758325 0t0 TCP node26:27017->node25:43016 (ESTABLISHED) mongod 20341 27018 mongodb 492u IPv4 143762443 0t0 TCP node26:27017->node24:60602 (ESTABLISHED) mongod 20341 27018 mongodb 493u IPv4 154865226 0t0 TCP node26:27017->node26:54800 (ESTABLISHED) mongod 20341 27018 mongodb 494u IPv4 164046515 0t0 TCP node26:27017->node24:42952 (ESTABLISHED) mongod 20341 27018 mongodb 495u IPv4 164046516 0t0 TCP node26:27017->node24:42960 (ESTABLISHED) mongod 20341 27018 mongodb 497u IPv4 154865844 0t0 TCP node26:27017->node25:41976 (ESTABLISHED) mongod 20341 27018 mongodb 500u IPv4 164046517 0t0 TCP node26:27017->node24:42968 (ESTABLISHED) mongod 20341 27018 mongodb 502u IPv4 164046518 0t0 TCP node26:27017->node26:60306 (ESTABLISHED) mongod 20341 27018 mongodb 503u IPv4 164046519 0t0 TCP node26:27017->node26:60314 (ESTABLISHED)
    • tail 10 mongod 20341 32608 mongodb 492u IPv4 143762443 0t0 TCP node26:27017->node24:60602 (ESTABLISHED) mongod 20341 32608 mongodb 493u IPv4 154865226 0t0 TCP node26:27017->node26:54800 (ESTABLISHED) mongod 20341 32608 mongodb 494u IPv4 164046515 0t0 TCP node26:27017->node24:42952 (ESTABLISHED) mongod 20341 32608 mongodb 495u IPv4 164046516 0t0 TCP node26:27017->node24:42960 (ESTABLISHED) mongod 20341 32608 mongodb 497u IPv4 154865844 0t0 TCP node26:27017->node25:41976 (ESTABLISHED) mongod 20341 32608 mongodb 500u IPv4 164046517 0t0 TCP node26:27017->node24:42968 (ESTABLISHED) mongod 20341 32608 mongodb 502u IPv4 164046518 0t0 TCP node26:27017->node26:60306 (ESTABLISHED) mongod 20341 32608 mongodb 503u IPv4 164046519 0t0 TCP node26:27017->node26:60314 (ESTABLISHED) mongod 20341 32608 mongodb 505u IPv4 164046523 0t0 TCP node26:27017->node26:60322 (ESTABLISHED) mongod 20341 32608 mongodb 730u IPv4 117137926 0t0 TCP node26:27017->node25:54730 (ESTABLISHED)
Liqang Liu
  • 1,654
  • 3
  • 12
  • 20

1 Answers1

1

/proc/${pid}/fd contains file descriptors connected to the shell, which show up as a number followed by a u in lsof:

$ la /proc/$$/fd
total 0
lrwx------ 1 username users 64 May 30 20:08 0 -> /dev/pts/0
lrwx------ 1 username users 64 May 30 20:08 1 -> /dev/pts/0
lrwx------ 1 username users 64 May 30 20:08 2 -> /dev/pts/0
lrwx------ 1 username users 64 May 30 20:08 255 -> /dev/pts/0
$ lsof -p $$
COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
bash    3720 username  cwd    DIR  254,3    12288 1835009 /home/username
bash    3720 username  rtd    DIR  254,2     4096       2 /
bash    3720 username  txt    REG  254,2   859688 2890163 /usr/bin/bash
bash    3720 username  mem    REG  254,2    46912 2885785 /usr/lib/libnss_files-2.27.so
bash    3720 username  mem    REG  254,2  2942480 2930144 /usr/lib/locale/locale-archive
bash    3720 username  mem    REG  254,2   457800 2890072 /usr/lib/libncursesw.so.6.1
bash    3720 username  mem    REG  254,2  2105608 2885835 /usr/lib/libc-2.27.so
bash    3720 username  mem    REG  254,2    14144 2885777 /usr/lib/libdl-2.27.so
bash    3720 username  mem    REG  254,2   363064 2890132 /usr/lib/libreadline.so.7.0
bash    3720 username  mem    REG  254,2   177680 2885836 /usr/lib/ld-2.27.so
bash    3720 username    0u   CHR  136,0      0t0       3 /dev/pts/0
bash    3720 username    1u   CHR  136,0      0t0       3 /dev/pts/0
bash    3720 username    2u   CHR  136,0      0t0       3 /dev/pts/0
bash    3720 username  255u   CHR  136,0      0t0       3 /dev/pts/0

They are both "right," but the count from lsof is the one relevant for running out of open files.

To find the relevant open files limit use ulimit -n.

l0b0
  • 55,365
  • 30
  • 138
  • 223
  • okay。。。and why `lsof | grep ` different with `lsof -p ` ? – Liqang Liu May 30 '18 at 08:23
  • Because `lsof | grep $pid` looks for a string *anywhere* in a massive output which probably happens to contain that number in a bunch of other places. – l0b0 May 30 '18 at 08:33
  • I don't think it's the reason, because i get the result of lsof | grep and watch the result, they are really relative with the , not other process's opened fd. – Liqang Liu May 31 '18 at 00:57
  • Without seeing the output there's no way I can judge what the issue is. Could you paste the diff in your question? – l0b0 May 31 '18 at 01:06
  • thks l0b0, i just append the result of my environment. – Liqang Liu May 31 '18 at 03:06