Incorrect EPOLLET behavior?

Question

Please consider the following program:

#define _GNU_SOURCE
#include <sys/epoll.h>
#include <fcntl.h>
#include <unistd.h>
#include <poll.h>

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>

int verify(int result, const char *msg) {
    if( result>=0 )
        return result;

    perror(msg);
    abort();

    return -1;
}

void writepipe( int fd, int num_bytes, const char *msg ) {
    unsigned char buffer[num_bytes];
    ssize_t num_written = verify( write(fd, buffer, num_bytes), msg );
    assert( num_written==num_bytes );
}

void readpipe( int fd, int num_bytes, const char *msg ) {
    unsigned char buffer[num_bytes];
    ssize_t num_read = verify( read(fd, buffer, num_bytes), msg );
    assert( num_read==num_bytes );
}

int main() {
    int pipefds[2];
    verify( pipe2(pipefds, O_NONBLOCK), "pipe creation failed" );

    int epollfd = verify(epoll_create1(0), "epoll creation failed");

    struct epoll_event evt;
    evt.events = EPOLLIN|EPOLLET;
    evt.data.u64 = 17;
    verify( epoll_ctl( epollfd, EPOLL_CTL_ADD, pipefds[0], &evt ), "epoll_add failed" );

    int num_events = verify( epoll_wait(epollfd, &evt, 1, 0), "epoll_wait failed" );
    assert(num_events == 0);

    writepipe( pipefds[1], 12, "initial filling of pipe" );

    num_events = verify( epoll_wait(epollfd, &evt, 1, 0), "epoll_wait failed" );
    assert(num_events == 1);
    assert(evt.data.u64 == 17);

    num_events = verify( epoll_wait(epollfd, &evt, 1, 0), "epoll_wait failed" );
    assert(num_events == 0);

    readpipe( pipefds[0], 12, "clean the data" );

    num_events = verify( epoll_wait(epollfd, &evt, 1, 0), "epoll_wait failed" );
    assert(num_events == 0);

    writepipe( pipefds[1], 3, "write no trigger" );

    num_events = verify( epoll_wait(epollfd, &evt, 1, 0), "epoll_wait on unarmed fd" );
    assert(num_events == 0);

    return 0;
}

The last assert fails.

Since we never got to reading an EPOLLET from the epoll, I was expecting the last epoll_wait to return 0. Instead, I get 1.

Why is that?

Kernel 4.13.0-39-generic from Ubuntu 16.10.

score 1 · Answer 1 · answered Dec 06 '18 at 08:57

A late answer, but maybe still helpful for others.

You assume in your last epoll_wait call that the fd is unarmed. This is not the case. If you actually want it to be unarmed, you can use EPOLLONESHOT. With this after it triggers once, you have to rearm it for the epoll. You might also assume that the second write does not cause the epoll to be triggered. This assumption is wrong, too. EPOLLET only guarantees that the EPOLLET does not get triggered again as long as there are no changes on the FD. The write on the pipe triggers a change, so the epoll gets triggered (not necessarily what people expect to happen).

The reason for this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor.

source: http://man7.org/linux/man-pages/man7/epoll.7.html

I don't really know what you mean with "we never got reading an EPOLLET", do you mean the EAGAIN that signals that all data has been read? This is actually irrelevant to your problem. You empty the pipe completely. So the next read would cause EAGAIN, but that does not change the behaviour mentioned above. Even if you do not read the data, the second write would trigger the epoll. The check for EAGAIN is just to make sure we fully read all data if there are no changes on a file descriptor.

This answer is entirely correct (to my astonishment). I came here after writing my own test program to for once and for all discover how epoll REALLY works, and I too had interpreted the man page of epoll(7) to say that (in the case of an EPOLLIN) that you MUST read until all data has been read (ie, EAGAIN or get back less than you try to read in the case of a stream-oriented file, like a pipe or socket) BEFORE the fd is armed again. But this turns out not to be true at all. — Carlo Wood, Jul 17 '19 at 21:38
My findings are that if and what you read(2) is completely irrelevant. epoll_wait() can and will report the same EPOLLIN event again when 1) more data was received since the last time the event was reported by epoll_wait(), 2) any other event is reported for that fd. E.g. when EPOLLOUT is returned and there is still data in the socket buffer that wasn't read yet then EPOLLIN is reported again, EVEN if there was NOT more data received since last time. — Carlo Wood, Jul 17 '19 at 21:41

Incorrect EPOLLET behavior?

1 Answers1