How to check if `closefrom` can be used for closing file descriptors at runtime?

Question

I'm looking to write some C code that will close all the currently open file descriptors, suitable to be used as part of a fork/exec to make a new process.

I know from the answer here that there are various platform-specific functions to do this efficiently, such as closefrom on Linux, but they're not available in all C libs (like musl) or kernels.

Most solutions I've seen check for the availability of these functions using compile time macros, but I'd like to try to do so at runtime. The idea is to have some code suitable for compiling statically and using on various platforms. The requirements are thus:

If the underlying kernel supports it, make use of closefrom functionality.
Otherwise, fall back to traditional means like looping through all possible file descriptors or reading from /proc.

One idea I had for Linux systems was just to call the syscall number, but I'd love to hear from C experts if this is a good idea:

#define CLOSE_RANGE_SYSCALL_NUMBER 436

void my_closefrom(int lowfd, int to) {
    long ret = syscall(CLOSE_RANGE_SYSCALL_NUMBER, lowfd, to, 0);

    if (ret != -1) {
        // Success!
    } else {
        // Otherwise, fall back to another approach
        // ...
    }
}

What advantage do you think you stand to gain by performing the check at runtime? — John Bollinger, Aug 04 '22 at 01:45
Under what circumstances do you expect to find a platform other than the one you compile on that is binary-compatible with (has the same ABI as) Linux/glibc but where one provides `closefrom()` and the other doesn't? It seems improbable, somehow. — Jonathan Leffler, Aug 04 '22 at 02:25
@JohnBollinger: the ability to statically link once and then run it on systems different from the one I compiled it on. Jonathan: typically I statically link against musl libc. So I should be able to run this on a Linux system with kernel >= 5.9 and get the performance boost, or run it on an earlier kernel and not. AFAIK the only "ABI" compatibility that would be relevant would be whether the kernel provides the desired syscall. — tom, Aug 04 '22 at 07:49
In other words, I don't want to depend on glibc's `closefrom` at all. (It's just a shallow wrapper around the syscall after all.) I just want to make use of kernel functionality when it's available and I'm interested in advice on whether this is a reasonable idea. — tom, Aug 04 '22 at 07:55

John Bollinger · Answer 1 · 2022-08-04T14:27:40.950

From a comments on the question:

[I'm looking for] the ability to statically link once and then run it on systems different from the one I compiled it on.

I should be able to run this on a Linux system with kernel >= 5.9 and get the performance boost, or run it on an earlier kernel and not. AFAIK the only "ABI" compatibility that would be relevant would be whether the kernel provides the desired syscall

Well no. There is also the question of what the syscall number is. And for some syscalls, what the argument format is expected to be. And since you intend to link statically, you don't get to look only at the closefrom syscall, you need to consider all attributes of all of them.

In other words, I don't want to depend on glibc's closefrom at all. (It's just a shallow wrapper around the syscall after all.) I just want to make use of kernel functionality when it's available and I'm interested in advice on whether this is a reasonable idea.

No, it is not a reasonable idea.

The purpose of the system call wrapper functions is to abstract kernel details from userspace programs. This protects you from issues such as the Linux system call numbers being different for different architectures, including x86 vs. x86_64, or the occasional change in system call numbers for a given arch. It also gives you a measure of source compatibility with other systems, such as MacOs, the BSDs, and Solaris. Overall, the wrapper functions are the stable kernel interface for userspace programs.

I cannot imagine making a direct syscall without being confident that the syscall number I requested was associated with the system function I wanted. That is exactly the kind of thing that might test successfully enough to release, and then fail mysteriously and / or devastatingly in the field, probably a couple of years later, after I've forgotten all about my nasty hack.

Better solutions include:

lowest common denominator approaches. That is, things that work on all supported machines. That you thereby forego faster alternatives available on a subset of supported machines is a cost of broad portability. If it's fast enough on machines that don't have (e.g.) closefrom(), then why do you need to make it faster on systems that do have that function? And how much speed do you really gain?
compile-time selection. You said you want to avoid this, but there's a reason that it is the usual approach for tuning programs to the capabilities of host machines. With static linking you don't need to worry about the runtime host's C standard library, but you do need to worry about kernel version. A common approach is to provide two (or more) binaries targeting different, possibly overlapping, ranges of kernel versions.
working around your need for the feature in the first place. For the particular case of closing files at fork / exec, you could
- set files as close-on-exec when you open them, OR
- register fork handlers (pthread_atfork()) as needed to close the files when the program forks. This should work even if the program's initial thread is its only one.

"the occasional change in system call numbers for a given arch" -- can you back this up? As I mentioned in the question, this syscall idea is only for Linux systems, where as far as I understand, there is a strong guarantee of stability and backwards compatibility for the userspace interface, including syscalls. How could a syscall number possibly change? You said "the wrapper functions are the stable kernel interface for userspace programs," and this may be true of other systems, but I don't think it's true of Linux where we have separate glibc, musl, etc. — tom, Aug 04 '22 at 22:29
Re: lowest common denominator approaches -- I wouldn't be asking the question if traditional approaches were "fast enough." The issue is that on some systems the `sysconf` file descriptor limit gets set to something on the order of INT_MAX, and looping through and closing all of these can make an exec take several minutes! — tom, Aug 04 '22 at 22:32
Unfortunately workarounds like `O_CLOEXEC` or fork handlers aren't feasible because this needs to work in a general setting (i.e. in the process library of a programming language). — tom, Aug 04 '22 at 22:35
@tom, if you are looking at this for a *library*, then you are probably taking too much responsibility on yourself. The application using your library should take responsibility for setting close-on-exec on files that it thinks need it. Your library's responsibility is to provide enough documentation of what it does that its users know they should do that. That's not a workaround, it's how things ought to be done. — John Bollinger, Aug 04 '22 at 22:38
It is *completely* routine for process libraries to provide a "close file descriptors" option. Just off the top of my head, Python's `subprocess` and Haskell's `System.Process` provide one. I'm working on improving the behavior of the Haskell one FWIW. I've never heard of such a library passing such a responsibility on to its users; indeed, IIRC some kinds of file descriptors you can obtain don't have a `O_CLOEXEC` option. — tom, Aug 04 '22 at 23:11
@tom, to the best of my knowledge, all file descriptors have a close-on-exec flag, which can be manipulated via `fcntl()`. This is not a driver-specific option, and there's no reason why it should be. Some system interfaces also provide means to turn that flag on when a file descriptor is initially opened, and some don't, but that doesn't mean that the file descriptors obtained via the latter don't support close-on-exec. — John Bollinger, Aug 05 '22 at 00:53
As far as syscalls changing, Linux promises two-year stability for those features specified stable, though in fact those interfaces are by and large *much* more stable than that. Other interfaces are less stable and their behavior does change, and a few have been removed or are marked obsolete and slated for removal. Existing syscalls are not renumbered during their lifetime as far as I am aware -- I did not mean to imply that. — John Bollinger, Aug 05 '22 at 01:02

How to check if `closefrom` can be used for closing file descriptors at runtime?

1 Answers1