4

I'm currently adding sockfds created from accept to an epoll instance with the following events:

const int EVENTS = (
    EPOLLET |
    EPOLLIN |
    EPOLLRDHUP |
    EPOLLONESHOT |
    EPOLLERR |
    EPOLLHUP);

Once an event is triggered, I pass it off to a handler thread, read and then re-enable the sockfd through epoll_ctl with the same flags. However, I only receive the EPOLLIN event one time. Also, if I kill the client anytime after the first event is received, I do not get hangup events either. From reading the man pages, I thought I understood the correct approach with EdgeTriggered and OneShot.

Below is some pseudo code for the process I'm using:

const int EVENTS = (
    EPOLLET |
    EPOLLIN |
    EPOLLRDHUP |
    EPOLLONESHOT |
    EPOLLERR |
    EPOLLHUP);

void event_loop()
{
    struct epoll_event event;
    struct epoll_event *events;
    events = calloc(100, sizeof event);
    while (1)
    {
        int x;
        int num_events = epoll_wait(epfd, events, 100, -1);
        for (x = 0; x < num_events; x++)
        {
            another_thread(fd);
        }
    }
}

void another_thread(int fd)
{
    // Read stuff until EAGAIN

    struct epoll_event event;
    event.data.fd = fd;
    event.events = EVENTS;
    epoll_ctl(epfd, EPOLL_CTL_MOD, fd, &event);
}

When I do the EPOLL_CTL_MOD operation, I do not receive any errors, but never get notified of other events. If I leave the read loop on repeat after the first event, it will read all subsequent data sent by client, so I know that the data is coming in and the fd is still open and working.

From checking strace, threads are created from clone and have the flag CLONE_FILES, so all threads share the same fd table.

What is the correct way to re-enable a fd for read events from a separate thread?

nathansizemore
  • 3,028
  • 7
  • 39
  • 63
  • A MCVE would help a lot. The pseudo-code doesn't explain the hand-offs between the event loop and handler threads, for example. There are three "moving parts" here: EPOLLET (have you tried your code without this flag?), EPOLLONESHOT and multi-threading. So, diagnosis of the _real_ problem is correspondingly difficult. – arayq2 Nov 25 '15 at 16:57
  • Was this issue solved for you? I am building something similar and am curious if you got it to work? – user855 Jul 06 '20 at 13:05
  • I'm curious too, after all this time. Perhaps it was as simple as switching the order of operations in `another_thread()`: doing the "Read stuff until EAGAIN" _after_ the call to `epoll_ctl()` to reactivate the edge trigger. There is a race condition otherwise: new data could enter the read buffer between the EAGAIN return from the I/O `read()` operation and the call to `epoll_ctl()`, which would lose the edge trigger. – arayq2 Oct 14 '20 at 01:29
  • @arayq2 That was the fix. Basically, reset all your flags from events before you start reading to avoid losing data from an event while reading. – nathansizemore Oct 15 '20 at 13:45

1 Answers1

1

However, I only receive the EPOLLIN event one time. Also, if I kill the client anytime after the first event is received, I do not get hangup events either.

man page for epoll_ctl(2) says that:

EPOLLONESHOT (since Linux 2.6.2) Sets the one-shot behavior for the associated file descriptor. This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with a new event mask.

In your case, when you get the first event, epoll disables your sockfd. When you re-enable your sockfd using EPOLL_CTL_MOD, it will notify all the events that are received by kernel after the re-registration. So any event between first notification and re-registration will be lost. This can be reason for not getting any hangup events or data.

Removing EPOLLONESHOT from events will correct your code, eventually you don't need to re-enable sockfd also.

And since you are using EPOLLET, there won't be any performance issue also.

dktrivedi
  • 111
  • 4
  • This is not the case, I have massive 3-5 second delays in the client code just to test for this case, and I see log messages that the fd has been readied before the client tries to send again. Removing the one shot flag has no effect :( – nathansizemore Nov 24 '15 at 13:21
  • are you giving 3-5 seconds of delay to client to check if you are getting an 'EPOLLRDHUP' event or just delay between two send? – dktrivedi Nov 24 '15 at 18:32
  • Delay between two sends. – nathansizemore Nov 24 '15 at 18:37
  • Because of EPOLLET, you will get notification whenever there is a state change in your registered event. In EPOLLIN | EPOLLET case, you will get notification whenever new data comes in read buffer. when first send is executed, epoll will return an event for it, epoll_wait will return successfully and your thread will read the data. Meanwhile main thread will go to epoll_wait again and it will wait for some event. After 3-5 seconds, second send will be executed and epoll_wait will return with same read event on socketfd. That is why you will get multiple read event notofications. – dktrivedi Nov 24 '15 at 18:49
  • I think you misread the question, I *only get one* notification. – nathansizemore Nov 24 '15 at 18:58
  • even after removing EPOLLONESHOT and epoll_ctl call in thread? I am sorry, I thought you are getting one more notification before and after second send. – dktrivedi Nov 24 '15 at 19:00
  • No, not after removing oneshot. I'd like to use oneshot, because I'd like to have a threaded eventloop. – nathansizemore Nov 24 '15 at 19:02
  • epoll_wait man page says that: "While one thread is blocked in a call to epoll_pwait(), it is possible for another thread to add a file descriptor to the waited- upon epoll instance. " Ideally Your code should be able to get notification for second send if re-registration is successful and second time data arrives **after** re-registration. Make sure something else is not causing this problem. – dktrivedi Nov 25 '15 at 03:14
  • Maybe its because I'm not using epoll_pwait? According to [this](https://bugzilla.kernel.org/show_bug.cgi?id=43072) it shouldnt matter – nathansizemore Nov 25 '15 at 05:47
  • It should not matter whether it is epoll_wait or epoll_pwait because epoll_pwait internally calls epoll_wait. – dktrivedi Nov 25 '15 at 06:08