Questions tagged [epoll]

epoll is a Linux 2.6 readiness notification API for sockets, pipes, and special event-, signal-, and timer descriptors which can operate both in level- and edge-triggered mode, although presently only level-triggered behaviour is in accordance with the documentation. As opposed to poll or select, epoll scales O(1) in respect to the number of descriptors and O(N) in respect realized events.

The epoll API is built around 3 functions:

  • epoll_create creates a new epoll instance and returns a file descriptor that refers to it. This descriptor can be operated on with the other epoll functions and can be added to a different epoll instance
  • epoll_ctl allows file descriptors (sockets, pipes, eventfd, timerfd, signalfd, and epoll) being added and removed to an epoll's set of monitored descriptors, as well as flags of existing descriptors being modified
  • epoll_wait will return up to maxevents queued events. If no events are available, it will return zero. If a timeout is provided and no events are available, epoll_wait will block for the duration of the timeout (a value of -1 means forever).

The conceptual idea behind the API is that applications usually have a certain set of descriptors that changes rarely if ever, but which needs to be observed for readiness many times. Also, typically a lot fewer descriptors are ready than open. epoll therefore separates copying the list of descriptors to watch from the actual watching and notifies registered listeners instead of iterating a list of descriptors.

The operation of level-triggered mode (default) is easy, since it is identical of how poll/select works. As long as the resource is ready (e.g. as long as there remains data to be read), every call to epoll_wait will return an event.

The operation of edge-triggered mode (EPOLLET flag) is more complicated, more error-prone, inconsistenly documented, and inconsistently implemented. In epoll(7), it is explained in terms of reading partial data causing the next call to epoll_wait to block until new data arrives, but not while some data remains in the buffers. It is therefore recommended to use non-blocking descriptors and reading until EAGAIN is received.
According to The Linux Programming Interface, edge-triggered mode only reports events that happened since the last call to epoll_wait.
In reality, it does a mixture of both (i.e. both reads and epoll_wait reset the status to "not ready"), and it does not work as indicated in respect of several epoll instances listening to the same socket or several threads waiting on the same epoll instance (observed under kernel 2.6.38 with timerfd and eventfd). Although epoll is supposed to signal all waiters upon arrival of an event, in edge-triggered mode it only ever signals a single waiter.

792 questions
25
votes
2 answers

What is an anonymous inode in Linux?

I made a google search about "anonymous inode" and it seems it's related to epoll ... but what actually is it?
mrkschan
  • 689
  • 1
  • 7
  • 14
24
votes
1 answer

How do I use EPOLLHUP

Could you guys provide me a good sample code using EPOLLHUP for dead peer handling? I know that it is a signal to detect a user disconnection but not sure how I can use this in code..Thanks in advance..
user800799
  • 2,883
  • 7
  • 31
  • 36
24
votes
2 answers

Is there epoll equivalent in Java?

Is there an equivalent of Linux epoll in Java? epoll allows a thread to react to a number of heterogenous events. For instance, I can have a thread that reacts to either a socket event or an input from the console. In C++ I can implement this by…
dfreit
  • 241
  • 1
  • 2
  • 5
23
votes
2 answers

Is 'epoll' the essential reason that Tornadoweb(or Nginx) is so fast?

Tornadoweb and Nginx are popular web servers for the moment and many benchmarkings show that they have a better performance than Apache under certain circumstances. So my question is: Is 'epoll' the most essential reason that make them so fast? And…
Mickey Shine
  • 12,187
  • 25
  • 96
  • 148
21
votes
3 answers

UDP Packet drop - INErrors Vs .RcvbufErrors

I wrote a simple UDP Server program to understand more about possible network bottlenecks. UDP Server: Creates a UDP socket, binds it to a specified port and addr, and adds the socket file descriptor to epoll interest list. Then its epoll waits for…
Bala
  • 357
  • 1
  • 4
  • 12
20
votes
4 answers

Memory handling with struct epoll_event

I'm developing a server in C with the epoll library and I have a question as to how memory is handled for struct epoll_event. I've noticed in some online examples that, when making epoll_ctl calls, the events argument is allocated on the stack and…
Blake Beaupain
  • 638
  • 1
  • 6
  • 16
19
votes
1 answer

TCP: When is EPOLLHUP generated?

Also see this question, unanswered as of now. There is a lot of confusion about EPOLLHUP, even in the man and Kernel docs. People seem to believe it is returned when polling on a descriptor locally closed for writing, i.e. shutdown(SHUT_WR), i.e.…
haelix
  • 4,245
  • 4
  • 34
  • 56
19
votes
1 answer

Eventloop has high ksoftirqd load; nginx does not but does same system-calls. Why?

I wrote some code that has an epoll-eventloop, accepts new connections and pretends to be a http-server. The posted code is the absolute minimum ... I removed everything (including all error-checks) to make it as short and to the point as…
Xatian
  • 772
  • 1
  • 8
  • 24
18
votes
2 answers

How does epoll's EPOLLEXCLUSIVE mode interact with level-triggering?

Suppose the following series of events occurs: We set up a listening socket Thread A blocks waiting for the listening socket to become readable, using EPOLLIN | EPOLLEXCLUSIVE Thread B also blocks waiting for the listening socket to become…
Nathaniel J. Smith
  • 11,613
  • 4
  • 41
  • 49
18
votes
1 answer

What are the underlying differences among select, epoll, kqueue, and evport?

I am reading Redis recently. Redis implements a simple event-driven library based on I/O multiplexing. Redis says it would choose the best multiplexing supported by the system, and gives the following code: /* Include the best multiplexing layer…
Min Fu
  • 789
  • 1
  • 6
  • 16
17
votes
1 answer

epoll_wait always sets EPOLLOUT bit?

On a listening socket I set the EPOLLIN bit however on client connections I set EPOLLIN | EPOLLOUT bits to struct epoll_event like so: struct epoll_event ev; ev.data.fd = fd; ev.events = EPOLLIN | EPOLLOUT; if (epoll_ctl(evs->epoll_fd,…
user1551592
16
votes
2 answers

Getting to know the basics of Asynchronous programming on *nix

For some time now I have been googling a lot to get to know about the various ways to acheive asynchronous programming/behavior on nix machines and ( as known earlier to me ) got confirmed on the fact that there is still no TRULY async pattern…
Arunmu
  • 6,837
  • 1
  • 24
  • 46
16
votes
4 answers

How to get errno when epoll_wait returns EPOLLERR?

Is there a way to find out the errno when epoll_wait returns EPOLLERR for a particular fd? Is there any further information about the nature of the error? Edit: Adding more information to prevent ambiguity epoll_wait waits on a number of file…
Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
15
votes
3 answers

I can't understand polling/select in python

I'm doing some threaded asynchronous networking experiment in python, using UDP. I'd like to understand polling and the select python module, I've never used them in C/C++. What are those for ? I kind of understand a little select, but does it block…
jokoon
  • 6,207
  • 11
  • 48
  • 85
15
votes
1 answer

epoll_wait: maxevents

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); I'm a little confused about the maxevents parameter. Let's say I want to write a server that can handle up to 10k connections. Would I define maxevents as 10000 then,…
someguy
  • 7,144
  • 12
  • 43
  • 57
1
2
3
52 53