19

I ask this question after trying my best to research the best way to implement a message queue server. Why do operating systems put limits on the number of open file descriptors a process and the global system can have? My current server implementation uses zeromq, and opens a subscriber socket for each connected websocket client. Obviously that single process is only going to be able to handle clients to the limit of the fds. When I research the topic I find lots of info on how to raise system limits to levels as high as 64k fds but it never mentions how it affects system performance and why it is 1k and lower to start with? My current approach is to try and dispatch messaging to all clients using a coroutine in its own loop, and a map of all clients and their subscription channels. But I would just love to hear a solid answer about file descriptor limitations and how they affect applications that try to use them on a per client level with persistent connections?

Aseem Bansal
  • 6,722
  • 13
  • 46
  • 84
jdi
  • 90,542
  • 19
  • 167
  • 203
  • Ok so I gather from all these answers that 1) It comes down to an issue of available RAM 2) Web server applications shouldn't rely on using large number of dynamically allocated file descriptors if portability is key. Because those implementing the server would then have to tune their servers FD limits. – jdi Jul 25 '11 at 19:19

4 Answers4

18

It may be because a file descriptor value is an index into a file descriptor table. Therefore, the number of possible file descriptors would determine the size of the table. Average users would not want half of their ram being used up by a file descriptor table that can handle millions of file descriptors that they will never need.

BobTurbo
  • 393
  • 1
  • 7
  • I like this answer because it specifically explains that it comes down to available RAM. So its a matter of me knowing that my application is meant to consume a very large number of file descriptors and that my server is specifically tuned for that application. Thanks! – jdi Jul 25 '11 at 19:17
4

There are certain operations which slow down when you have lots of potential file descriptors. One example is the operation "close all file descriptors except stdin, stdout, and stderr" -- the only portable* way to do this is to attempt to close every possible file descriptor except those three, which can become a slow operation if you could potentially have millions of file descriptors open.

*: If you're willing to be non-portable, you cna look in /proc/self/fd -- but that's besides the point.

This isn't a particularly good reason, but it is a reason. Another reason is simply to keep a buggy program (i.e, one that "leaks" file descriptors) from consuming too much system resources.

4

For performance purposes, the open file table needs to be statically allocated, so its size needs to be fixed. File descriptors are just offsets into this table, so all the entries need to be contiguous. You can resize the table, but this requires halting all threads in the process and allocating a new block of memory for the file table, then copying all entries from the old table to the new one. It's not something you want to do dynamically, especially when the reason you're doing it is because the old table is full!

TMN
  • 3,060
  • 21
  • 23
  • 3
    That is correct on unix platforms. On windows file handles are used, and a process can allocate 16 million handles by default. The handle table is dynamically allocated, so you're more likely to run out of memory than handles. But if you do run out of handles, strange things happen. See http://blogs.technet.com/b/markrussinovich/archive/2009/09/29/3283844.aspx – David Roussel Jul 23 '11 at 20:43
  • 1
    There are also other things that take up more space (and possibly time) with more FDs - FD masks for select(), for instance. – Nick Johnson Jul 25 '11 at 03:06
  • Thanks. I can see that raising the limits is not a dynamic thing. I can see this answer leading into the suggestion that allocating a larger static table for FDs would be more dedicated memory? – jdi Jul 25 '11 at 19:26
2

On unix systems, the process creation fork() and fork()/exec() idiom requires iterating over all potential process file descriptors attempting to close each one, typically leaving leaving only a few file descriptors such as stdin, stdout, stderr untouched or redirected to somewhere else.

Since this is the unix api for launching a process, it has to be done anytime a new process is created, including executing each and every non built-in command invoked within shell scripts.

Other factors to consider are that while some software may use sysconf(OPEN_MAX) to dynamically determine the number of files that may be open by a process, a lot of software still uses the C library's default FD_SETSIZE, which is typically 1024 descriptors and as such can never have more than that many files open regardless of any administratively defined higher limit.

Unix has a legacy asynchronous I/O mechanism based on file descriptor sets which use bit offsets to represent files to wait on and files that are ready or in an exception condition. It doesn't scale well for thousands of files as these descriptor sets need to be setup and cleared each time around the runloop. Newer non standard apis have appeared on the major unix variants including kqueue() on *BSD and epoll() on Linux to address performance shortcomings when dealing with a large number of descriptors.

It is important to note that select()/poll() is still used by A LOT of software as for a long time it has been the POSIX api for asynchronous I/O. The modern POSIX asynchronous IO approach is now aio_* API but it is likely not competitve with kqueue() or epoll() API's. I haven't used aio in anger and it certainly wouldn't have the performance and semantics offered by native approaches in the way they can aggregate multiple events for higher performance. kqueue() on *BSD has really good edge triggered semantics for event notification allowing it to replace select()/poll() without forcing large structural changes to your application. Linux epoll() follows the lead of *BSD kqueue() and improves upon it which in turn followed lead of Sun/Solaris evports.

The upshot is that increasing the number of allowed open files across the system adds both time and space overhead for every process in the system even if they can't make use of those descriptors based on the api they are using. There are also aggregate system limits as well for the number of open files allowed. This older but interesting tuning summary for 100k-200k simultaneous connections using nginx on FreeBSD provides some insight into the overheads for maintaining open connections and another one covering a wider range of systems but "only" seeing 10K connections as the Mt Everest.

Probably the best reference for unix system programing is W. Richard Stevens Advanced Programming in the Unix Environment

Andrew Hacking
  • 6,296
  • 31
  • 37
  • 1
    Hmm. Where is there documentation supporting that it is required that a fork() iterate over all file descriptors trying to close each one? The fork() manpage says that the child inherits a copy of the parent file descriptor table. Does this mean that if one *wanted* to close file descriptors on fork() that it *could* mean iterating over a much larger list if more file descriptors are permitting in the parent process? – jdi Sep 09 '14 at 10:26
  • Also, I notice you are making references to more historical/legacy Unix situations. Are these still relevant to modern Linux distros in a way that they still impact the default limits? For instance, on my OSX box, the default file handle ulimit is extremely low (256) yet it is BSD and has supported kqueue for a long time. – jdi Sep 09 '14 at 10:27
  • I guess I'll never know the reason for this. This is for Unix only and without any references. – Aseem Bansal Sep 09 '14 at 21:05
  • @jdi fork() itself doesn't close file descriptors, its the unix idiom for fork()/exec() that results in the need for the newly forked child (still running with the same program text as the parent) to close all file descriptors leaving just the basic stdin/stdout/stderr open prior to calling exec() to load and replace the program text. – Andrew Hacking Sep 10 '14 at 07:41
  • @AseemBansal Yes this is unix since the question was specifically referring to file descriptor limits and not windows file handles or Mach microkernel ports. – Andrew Hacking Sep 10 '14 at 07:45
  • @jdi updated with some more information on historical and current apis, and some guides on overheads when scaling up connections – Andrew Hacking Sep 10 '14 at 10:43
  • I think my question wasn't specific to Unix. It applies just as easily to OSX which has the same concept of limiting file descriptors, to even a lower default than Linux. I'm getting the sense that your answer is saying it used to be more of a problem, but isn't anymore. – jdi Sep 10 '14 at 19:32
  • @jdi OS/X **IS** Unix, in fact it is one of a few systems that is actually certified by the OpenGroup as real Unix. Linux is not Unix as defined by the OpenGroup, and nor is *BSD despite its heritage, but practically speaking they close enough for most people. – Andrew Hacking Sep 11 '14 at 00:15
  • @jdi I am saying anytime you increase limits you increase both memory and processing overhead and it needs to be done very carefully for your particular deployment scenario. – Andrew Hacking Sep 11 '14 at 00:24
  • Thanks for the information. Seems in summary you are saying the same thing summarized by BobTurbo and TMN, that increases memory overhead. – jdi Sep 11 '14 at 04:29
  • @jdi Yes in part, increases memory overhead, but also processing overheads too, fork()/exec() being one such example. Ive also provided some references around scalability as the question specifically asked for how increasing files for persistent connections impacts performance. Those references provide you a deeper level of understanding on what the kernel has to maintain to support a connection as you scale up. – Andrew Hacking Sep 12 '14 at 00:10