The reason for SOCK_CLOEXEC
to exist is to avoid a race condition between getting a new socket from accept
and setting the FD_CLOEXEC
flag afterwards.
Normally if you want the file descriptor to be close-on-exec you'd first obtain the file descriptor in some way, then call fcntl(fd, F_SETFD, FD_CLOEXEC)
. But in a threaded program there is a possibility for a race condition between getting that file descriptor (in this case from accept
) and setting the CLOEXEC flag. Therefore Linux has recently changed most (if not all) system calls that return new file descriptors to also accept flags that tell the kernel to atomically set the close-on-exec flag before making the file descriptor valid. That way the race condition is closed.
If you wonder why close on exec exists, it's because in some cases, especially when you're executing non-privileged programs from a privileged one, you don't want some file descriptors to leak to that program.