6

select on fds higher then 255 do not check if the fd is open. Here is my example code:

#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/select.h>

int main()
{
    fd_set set;
    for(int i = 5;i<FD_SETSIZE;i++)
    {
        printf("--> i is %d\n", i);
        FD_ZERO(&set);
        FD_SET(i, &set);
        close(i);

        int retval = select(FD_SETSIZE, &set, NULL, NULL, NULL);
        if(-1 == retval)
        {
            perror("select");
        }
    }
}

This results in:

--> i is 5
select: Bad file descriptor
...
--> i is 255
select: Bad file descriptor
--> i is 256

Then the application blocks. Why does this not create a EBADF on 256 till FD_SETSIZE?

Requested Information from comments:

The result of prlimit is:

NOFILE     max number of open files                1024   1048576

This is the result of strace ./test_select:

select(1024, [127], NULL, NULL, NULL)   = -1 EBADF (Bad file descriptor)
dup(2)                                  = 3
fcntl(3, F_GETFL)                       = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
write(3, "select: Bad file descriptor\n", 28select: Bad file descriptor
) = 28
close(3)                                = 0
write(1, "--> i is 128\n", 13--> i is 128
)          = 13
close(128)                              = -1 EBADF (Bad file descriptor)
select(1024, [128], NULL, NULL, NULL

Debunking thoughts from the comments:

#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/select.h>
#include <fcntl.h>

int main()
{
    char filename[80];
    int fd;
    for(int i = 5;i<500;i++)
    {
        snprintf(filename, 80, "/tmp/file%d", i);
        fd = open(filename, O_RDWR | O_APPEND | O_CREAT);
    }
    printf("--> fd is %d, FD_SETSIZE is %d\n", fd, FD_SETSIZE);
    fd_set set;
    FD_ZERO(&set);
    FD_SET(fd, &set);
    int retval = select(FD_SETSIZE, NULL, &set, NULL, NULL);
    if(-1 == retval)
    {
        perror("select");
    }
}

Results in:

$ ./test_select
--> fd is 523, FD_SETSIZE is 1024

Process exits normally, no blocking.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
kuga
  • 1,483
  • 1
  • 17
  • 38
  • http://man7.org/linux/man-pages/man2/select.2.html select() can monitor only file descriptors numbers that are less than FD_SETSIZE; – mco Nov 03 '17 at 14:32
  • 1
    And FD_SETSIZE is 1023. Else my for loop would not reach 256. – kuga Nov 03 '17 at 14:34
  • *you're closing each file descriptor that you put into the loop* That will create `EBADF`. Additionally you're using uninitialized contents in select, that **very much** generate `EBADF`. – Antti Haapala -- Слава Україні Nov 03 '17 at 14:36
  • 1
    I know this. Why does this not create a `EBADF` on 256? Also, `set` is initialized each loop iteration. – kuga Nov 03 '17 at 14:38
  • You should probably use `getrlimit` to find out what the current limit of FDs is when the code runs – Chris Turner Nov 03 '17 at 14:45
  • @kuga install the `strace` package. Compile your program, and run your program with the command `strace ./yourprogram` , this will show us what syscalls the program is doing, and if it blocks, it will show which systcall it is stuck at (or if your program does not block, but exits instead, it will show that too). post the last handful of lines so others can also see what's going on. – nos Nov 03 '17 at 14:46
  • You are using `select()` wrongly. From [`man select`](http://man7.org/linux/man-pages/man2/select.2.html): "*nfds should be set to the highest-numbered file descriptor in any of the three sets, plus 1.*" So it should be `... = select(i+1, ...` – alk Nov 03 '17 at 15:35
  • @alk this is correct, but it gives the same result. Setting nfds correcly only speeds things up. – kuga Nov 03 '17 at 15:39
  • Why `#include `? This is a C++ header. For C you should be using `stdio.h`. – alk Nov 03 '17 at 15:45
  • @alk Just C++ habits. But again, Same results. Changed the code. – kuga Nov 03 '17 at 15:49
  • 2
    As bizarre and shouldn't-happen as this is, I can in fact reproduce the effect on my computer (it actually gets stuck at fd 64 for me). – zwol Nov 03 '17 at 15:49
  • Can reproduce on a vanilla Debian (Linux debian-stable 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux) – alk Nov 03 '17 at 15:53
  • 1
    It is probably the libc-implementation limiting the fd_sets to 256 bits. The higher numbers are just ignored, so the call with fd=256 set is treated like one with an empty set. – joop Nov 03 '17 at 15:54
  • Another data point: on NetBSD 7.1 and FreeBSD 11.1, the program correctly reports select failing with EBADF for all fds from 5 to FD_SETSIZE. (FD_SETSIZE is 256 and 1024 on these systems, respectively.) This is starting to smell like an operating system bug in Linux or glibc. – zwol Nov 03 '17 at 15:58
  • @joop If that were true (and I'm starting to think it is) it would be a bug in the C library, because FD_SET is documented to work for any fd numerically less than FD_SETSIZE and FD_SETSIZE is clearly 1024 on OP's system. – zwol Nov 03 '17 at 15:58
  • @joop looking at `set` after calling `FD_SET(256, &set);` in gdb and it's set it to `{__fds_bits = {0, 0, 0, 0, 1, 0 }}`. Looks like more likely the problem is in `select` – Chris Turner Nov 03 '17 at 16:13
  • @joop If libc was ignoring values above 255, the second example i posted in the question would not work. Something is very shady here! – kuga Nov 03 '17 at 16:19
  • Just to throw further confusion on this... the EBADF occurs for values of i>256 - it's only when it's exactly 256 that it doesn't work as expected – Chris Turner Nov 03 '17 at 16:21
  • @ChrisTurner With kernel 4.13 on x86-64, I see the effect for all values of i from 256 up to FD_SETSIZE... – zwol Nov 03 '17 at 16:24
  • @zwol I can confirm that too...not entirely sure how I managed to get it to EBADF for values more than 256 now – Chris Turner Nov 03 '17 at 16:32
  • @zwol Could also be the syscall-wrapping or the copying from/to userspace inside the kernel (I seriously doubt that, I've maintained servers with select on > 256 *open* fds) – joop Nov 03 '17 at 17:35

1 Answers1

4

Something very strange is going on here. You may have found a bug in the Linux kernel.

I modified your test program to make it more precise and also to not get stuck when it hits the problem:

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h>

int main(void)
{
    fd_set set;
    struct timeval tv;
    int i;

    for(i = 5; i < FD_SETSIZE; i++)
    {
        FD_ZERO(&set);
        FD_SET(i, &set);

        tv.tv_sec = 0;
        tv.tv_usec = 1000;

        close(i);
        int retval = select(FD_SETSIZE, &set, 0, 0, &tv);
        if (retval == -1 && errno == EBADF)
          ;
        else
        {
            if (retval > 0)
                printf("fd %d: select returned success (%d)\n", i, retval);
            else if (retval == 0)
                printf("fd %d: select timed out\n", i);
            else
                printf("fd %d: select failed (%d; %s)\n", i, retval, strerror(errno));
            return 1;
        }
    }
    return 0;
}

My understanding of POSIX says that, whatever FD_SETSIZE is, this program should produce no output and exit successfully. And that is what it does on FreeBSD 11.1 and NetBSD 7.1 (both running on x86 processors of some description). But on Linux (x86-64, kernel 4.13), it prints

fd 256: select timed out

and exits unsuccessfully. Even stranger, if I run the same binary under strace, that changes the output:

$ strace -o /dev/null ./a.out
fd 64: select timed out

The same thing happens if I run it under gdb, even if I don't tell gdb to do anything other than just run the program.

Reading symbols from ./a.out...done.
(gdb) r
Starting program: /tmp/a.out 
fd 64: select timed out
[Inferior 1 (process 8209) exited with code 01]

So something is changing just because the process is subject to ptrace monitoring. That can only be caused by the kernel.

I have filed a bug report on the Linux kernel and will report what they say about it.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Thanks for the sum up. On my machine (Ubuntu 16.04, kernel 4.4.0-97-generic). The code failes at 256, when stracing at 128. – kuga Nov 03 '17 at 16:33
  • @kuga What does `uname -m` print for you? – zwol Nov 03 '17 at 16:36
  • `x86_64`. Program is compiled with `cc -g -O0 main.cpp -o test_select` – kuga Nov 03 '17 at 16:38
  • 1
    Here's the culprit: https://github.com/torvalds/linux/blob/master/fs/select.c#L620 – Petr Skocik Nov 03 '17 at 17:35
  • @PSkocik I don't know enough about the guts of the kernel to understand why that is the culprit - are you saying that `fdt->max_fds` may be lower than FD_SETSIZE (and also, apparently, lower than the current RLIMIT_NOFILE setting) under some circumstances? Where should I look for the code that controls what that value will be? – zwol Nov 03 '17 at 17:58
  • 3
    Hmm... the `select(2)` manual page contains this statement in NOTES: "Moreover, POSIX requires `fd` to be a valid file descriptor." Since your `fd` here is not valid, that would seem to be an out bug-wise. I realize that is somewhat in conflict with the EBADF error return. (I believe @PSkocik is right. The issue comes up because there are bits set in the `fd_set` *beyond* the task's current `max_fds` setting -- those bits cannot possibly correspond to valid file descriptors, but they *can* be set. Ideally the kernel would check for any set bits beyond `max_fds` and return EBADF.) – Gil Hamilton Nov 03 '17 at 18:28
  • `max_fds` should be the number of currently allocated filedescriptors (can be less than the resource limit -- the filedescriptors basically start as a static array that gets changed to a dynamic one if the fds fill it; I'm getting 256 if I print it with printk) and the downsizing of `n` to at most `max_fds` means it's as if the caller passed no more than the current `max_fds` -- the filedescriptors beyond the current `max_fds` won't be looked at. – Petr Skocik Nov 03 '17 at 18:34
  • 4
    `max_fds` is at least `NR_OPEN_DEFAULT` (== `BITS_PER_LONG`), usually 64. However, it is adjusted during a fork (`dup_fds`). I suspect that the shell forks while holding a high-numbered open file descriptor (which it then closes before the `exec`). That results in the 256. Whereas, `gdb` (or `ptrace`) forks without such a high-numbered open file descriptor so that you have the minimum (64) size. – Gil Hamilton Nov 03 '17 at 18:34
  • Bash uses fd 255 opened as its source script descriptor. Can be the terminal or the script running. Try (under bash) `ls -l /proc/$$/fd/` – A.B Nov 03 '17 at 18:55
  • @GilHamilton I don't have quite enough brain right now to parse standardese, but I _think_ [the POSIX specification for `select`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html) requires the kernel to look at all bits up to the `nfds` argument and to return with `EBADF` if any of the set bits correspond to fds that aren't open. – zwol Nov 03 '17 at 19:44
  • 1
    *Ad argumentum*, it does have this under the macro descriptions: "The behavior of these macros is undefined if the fd argument is less than 0 or greater than or equal to FD_SETSIZE, or *if fd is not a valid file descriptor*, ..." You could certainly argue that that doesn't apply to the behavior of select/pselect, but OTOH the only specified way to produce an `fd_set` for those is to use the macros. So, by extension, I think you could argue this away as undefined behavior. (That being said, I personally agree that it's a bug. ;) It should work as you expected.) – Gil Hamilton Nov 03 '17 at 20:08