34

I have a core dump file from a process that has probably a file descriptor leak (it opens files and sockets but apparently sometimes forgets to close some of them). Is there a way to find out which files and sockets the process had opened before crashing? I can't easily reproduce the crash, so analyzing the core file seems to be the only way to get a hint on the bug.

ks1322
  • 33,961
  • 14
  • 109
  • 164
oliver
  • 6,204
  • 9
  • 46
  • 50

8 Answers8

10

If you have a core file and you have compiled the program with debugging options (-g), you can see where the core was dumped:

$ gcc -g -o something something.c
$ ./something
Segmentation fault (core dumped)
$ gdb something core

You can use this to do some post-morten debugging. A few gdb commands: bt prints the stack, fr jumps to given stack frame (see the output of bt).

Now if you want to see which files are opened at a segmentation fault, just handle the SIGSEGV signal, and in the handler, just dump the contents of the /proc/PID/fd directory (i.e. with system('ls -l /proc/PID/fs') or execv).

With these information at hand you can easily find what caused the crash, which files are opened and if the crash and the file descriptor leak are connected.

ks1322
  • 33,961
  • 14
  • 109
  • 164
terminus
  • 13,745
  • 8
  • 34
  • 37
  • 4
    This doesn't really answer the question, which is about using a core file to discover open files, not adding debug output to an existing program. oliver can't reproduce the issue anyway. – craig65535 Nov 03 '16 at 17:22
6

Your best bet is to install a signal handler for whatever signal is crashing your program (SIGSEGV, etc.).

Then, in the signal handler, inspect /proc/self/fd, and save the contents to a file. Here is a sample of what you might see:

Anderson cxc # ls -l  /proc/8247/fd
total 0
lrwx------ 1 root root 64 Sep 12 06:05 0 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 1 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 10 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Sep 12 06:05 11 -> socket:[124061]
lrwx------ 1 root root 64 Sep 12 06:05 12 -> socket:[124063]
lrwx------ 1 root root 64 Sep 12 06:05 13 -> socket:[124064]
lrwx------ 1 root root 64 Sep 12 06:05 14 -> /dev/driver0
lr-x------ 1 root root 64 Sep 12 06:05 16 -> /temp/app/whatever.tar.gz
lr-x------ 1 root root 64 Sep 12 06:05 17 -> /dev/urandom

Then you can return from your signal handler, and you should get a core dump as usual.

Martin Del Vecchio
  • 3,558
  • 2
  • 27
  • 36
4

One of the ways I jump to this information is just running strings on the core file. For instance, when I was running file on a core recently, due to the length of the folders I would get a truncated arguments list. I knew my run would have opened files from my home directory, so I just ran:

strings core.14930|grep jodie

But this is a case where I had a needle and a haystack.

JodieC
  • 151
  • 2
2

Recently during my error troubleshooting and analysis , my customer provided me a coredump which got generated in his filesystem and he went out of station in order to quickly scan through the file and read its contents i used the command

strings core.67545 > coredump.txt and later i was able to open the file in file editor.

2

You can try using strace to see the open, socket and close calls the program makes.

Edit: I don't think you can get the information from the core; at most it will have the file descriptors somewhere, but this still doesn't give you the actual file/socket. (Assuming you can distinguish open from closed file descriptors, which I also doubt.)

mweerden
  • 13,619
  • 5
  • 32
  • 32
2

If the program forgot to close those resources it might be because something like the following happened:

fd = open("/tmp/foo",O_CREAT);
//do stuff
fd = open("/tmp/bar",O_CREAT); //Oops, forgot to close(fd)

now I won't have the file descriptor for foo in memory.

If this didn't happen, you might be able to find the file descriptor number, but then again, that is not very useful because they are continuously changing, by the time you get to debug you won't know which file it actually meant at the time.

I really think you should debug this live, with strace, lsof and friends.

If there is a way to do it from the core dump, I'm eager to know it too :-)

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
1

A core dump is a copy of the memory the process had access to when crashed. Depending on how the leak is occurring, it might have lost the reference to the handles, so it may prove to be useless.

lsof lists all currently open files in the system, you could check its output to find leaked sockets or files. Yes, you'd need to have the process running. You could run it with a specific username to easily discern which are the open files from the process you are debugging.

I hope somebody else has better information :-)

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
0

Another way to find out what files a process has opened - again, only during runtime - is looking into /proc/PID/fd/ , which contains symlinks to open files.

skolima
  • 31,963
  • 27
  • 115
  • 151