5

since a few days we encounter a problem with our ArangoDB installation. A few minutes/up to an hour after start up all connections to the database are refused. The arango log file says that there are "Too many open files". A "lsof | grep arango | wc -l" shows that the database has around 50,000 open file handles, which is a lot under the max. allowed by the linux system (around 3m). Has anyone an idea where this error comes from?

We are using a Ubuntu Linux with a 3.13 kernel. 30 GB RAM and three cores. The database is still very small with around 1,5m entries and a size of 50GB.

Thx, secana

EDIT: "netstat -anpt | fgrep 2480" shows:

root@syssec-graphdb-001-test:~# netstat -anpt | fgrep 2480
tcp        0      0 10.215.17.193:2480      0.0.0.0:*               LISTEN               7741/arangod
tcp        0      0 10.215.17.193:2480      10.215.50.30:53453      ESTABLISHED          7741/arangod
tcp        0      0 10.215.17.193:2480      10.215.50.31:49299      ESTABLISHED          7741/arangod
tcp        0      0 10.215.17.193:2480      10.215.50.30:53155      ESTABLISHED          7741/arangod

"ulimit -n" has a result of 1024, so I think that the ~50,000 are all arango processes together.

Last lines in log file before the database died:

2015-05-26T12:20:43Z [9672] ERROR cannot open datafile '/data/arangodb/databases/database-235999516/collection-28464454696/datafile-18806474509149.db': 'Too many open files'
2015-05-26T12:20:43Z [9672] ERROR cannot open datafile '/data/arangodb/databases/database-235999516/collection-28464454696/datafile-18806474509149.db': Too many open files
2015-05-26T12:20:43Z [9672] DEBUG [arangod/VocBase/collection.cpp:1632] cannot open '/data/arangodb/databases/database-235999516/collection-28464454696', check failed
2015-05-26T12:20:43Z [9672] ERROR cannot open document collection from path '/data/arangodb/databases/database-235999516/collection-28464454696'
stj
  • 9,037
  • 19
  • 33
secana
  • 671
  • 6
  • 15
  • Can you also do a "netstat -anpt | fgrep 8529", where 8529 is the server port? – fceller May 26 '15 at 11:49
  • What driver are you using to connect to ArangoDB? – stj May 26 '15 at 11:51
  • What is the result of `ulimit -n` when run in the same environment as the arangod process (i.e. same shell and user)? It's probably a lot less than the maximum *possible* value. It may help increasing the maximum # of open files via `ulimit` for the arangod process. – stj May 26 '15 at 11:55

2 Answers2

2

It looks like it will make sense to increase the max. number of open files a process is allowed to manage. Given the stated database size of around 50 GB, the (presumably default) value of 1024 seems to be too low.

arangod will require one file descriptor for each parallel client connection. That may not be many, but in the face of HTTP keep-alive connections this could already account for several file descriptors.

Additionally, each datafile of an active collection will need to be memory-mapped and cost one file descriptor as well. With the default datafile size of 32 MB, a database size of 50 GB (on disk) will already consume 1,600 file descriptors:

50 GB database size / (32 MB default size / 1 datafile) = 1600 datafiles

Increasing the ulimit -n value for the arangod user and environment therefore will make sense. You can confirm that arangod can actually use the configured number of file descriptors by starting it with option --server.descriptors-minimum <value>, e.g.

--server.descriptors-minimum 32768 

for that many file descriptors. If arangod cannot effectively use that specified amount of file descriptors, it will fail at start with a fatal error. Of course that option can also be put into the arangod.conf file.

Additionally, the default size for (new) datafiles can be increased via the journalSize parameter for collections. That won't help right now, but will lower the number of required file descriptors for data saved in the future.

stj
  • 9,037
  • 19
  • 33
  • Increasing the ulimit helped, thx. For the long run I increased the jounal size, too. – secana May 27 '15 at 08:31
  • Is the datafile size in 2.5.7 version 32mb or more. I have an installation where data files are growing larger than 32mb. One of the collection has 3 large files 182M for compaction, 190M for datafile1 and 182M datafile2. arangod.conf is default and journal file size is set to 32M – Deepak Agarwal Aug 19 '15 at 16:09
  • This might happen in two situations: First, if documents bigger than 32M are used, they won't fit into a 32M datafile, so datafiles will dynamically extend in size for these bigger documents. Second, when the compaction processes full datafiles, it may concatenate the remaning contents of a few datafiles together. – stj Aug 20 '15 at 07:29
  • By the way, ArangoDB 2.7 will have adjusted (read: raised) limits value in its start/stop scripts, so there will be less need to adjust the limits manually. – stj Aug 20 '15 at 07:35
2

For emergencies when you can't restart the database, like in my case, you will find very useful this blog post that explains how you can change the ulimit of a running process.

If your distribution has util-linux-2.21, you can use the "prlimit" tool, or you can compile the small example C program in the blog post that worked great for me.

To check the actual limits of a process you can use:

cat /proc/<PID>/limits

Good luck!

Razvan Grigore
  • 1,649
  • 17
  • 17