4

I'm working with a multithreaded embedded application in which epoll is used for IO in one of the threads. I'm relying on a particular feature of epoll that specifies that closing a file descriptor automatically removes it from the epoll set (Question/Answer 6 in man 7 epoll). In this case, the file descriptor close is done in the same thread that epoll_wait is invoked. What ends up happening is that epoll_wait returns an event on a file descriptor after it has been closed and the program ends up crashing because it tries to access resources that were deallocated when the file descriptor was closed. As far as I know, the file descriptor is not duped anywhere, though I do not know how to validate this. I know for a fact that there are no calls to fork(), dup(), dup2(), or fcntl() with the particular dup option. This particular file descriptor is registered with EPOLLOUT, EPOLLIN, EPOLLERR, and EPOLLHUP. It is level-triggered. Are there any caveats to this feature that anybody knows about? Is the man page incorrect? Any useful information that can help me further debug the issue? I know I could just remove the file descriptor from the set, but I would like to know why this is happening.

Craig M. Brandenburg
  • 3,354
  • 5
  • 25
  • 37
duffsterlp
  • 347
  • 1
  • 5
  • 15
  • The events that epoll_wait returned for that file descriptor were EPOLLIN, EPOLLHUP, and EPOLLERR. – duffsterlp Nov 26 '13 at 16:35
  • You can use [strace](http://linux.die.net/man/1/strace) to verify that your program is doing what you think it's doing. Can you reproduce this behavior in a simple, single-threaded test program? – Craig M. Brandenburg Nov 26 '13 at 17:20

1 Answers1

3

Closing a file descriptor does not seem to remove it from the epoll. I tried it with very simple example on a 3.12.2. I'm inclined to call the man page wrong or inaccurate.

What I did in a test was:

  • created a tcp socket
  • bound it to localhost:5555
  • set it to listen
  • created an epoll
  • added the socket there with hup, err and in
  • slept a bit so I could optionally connect to with with nc
  • closed the socket
  • epoll_wait
  • epoll_ctl del
  • cleaned up

The wait works even though the socket had been closed whether I had connected to it or not.

Edit: The epoll_ctl_del did fail if the socket has been closed. And after reading the current man pages, it seems they're actually ok. The epoll page points to select(2) about closing a socket being monitored and that page says that the behaviour is unspecified.

Tommi Kyntola
  • 704
  • 6
  • 8
  • For the record, what I do in one server of mine when a connection is to be closed I remove it from the epoll explicitly (and actually flag the epoll to re-wait incase it had just gotten out of epoll_wait, which is run in another thread) and then go on to close and destroy associated resources. – Tommi Kyntola Dec 05 '13 at 19:06
  • The fact that epoll_ctl del worked in your example is highly suspicious. The socket is either not closed or have been duplicated. Can we see the code ? – Jean-Bernard Jansen Oct 17 '16 at 09:00
  • Found the code. Sure enough the `epoll_ctl_del` does fail. I made flag "doclose" to my test program to check for epoll wait return values with and without closing the socket and I must have taken the "del worked" from a run without the close, when it of course works. – Tommi Kyntola Oct 27 '16 at 12:21
  • This is correct. Here's an example program with strace output to prove it: https://stackoverflow.com/a/51543273/432 – andrewrk Jul 26 '18 at 16:13
  • I think that what happens is that at the moment of an event, if no thread is in epoll_wait(), that event is added to a "ready list" (See https://idndx.com/2014/09/22/the-implementation-of-epoll-3/ where it talks about `rdllink`). Closing a fd does not remove the event from the ready list(?), only from the interest list (so that new events won't cause them to be added to that ready list). If then one calls epoll_wait() it looks in this ready list for events that happened while that thread wasn't in epoll_wait(). – Carlo Wood Jul 10 '19 at 01:12