5

Note: see my edit at the end of the post.

I have a node.js (Express) server which is serving approximately 15-30 requests/second. I am serving a bunch of simple JADE templates, and a Durandal SPA application, with the bulk of the requests being for the simple JADE templates. Everything goes fine for a couple of minutes, however the server starts getting EMFILE errors after a while and eventually crashes. After troubleshooting a bit, I found out that the output of lsof -i -n -P | grep node after a while contains a huge amount of rows of this kind:

node    8800 my_user   13u  IPv4 906628      0t0  TCP 172.x.x.x:3000->x.x.x.x:44654 (ESTABLISHED)
node    8800 my_user   14u  IPv4 908407      0t0  TCP 172.x.x.x:3000->x.x.x.x:13432 (ESTABLISHED)
node    8800 my_user   15u  IPv4 908409      0t0  TCP 172.x.x.x:3000->x.x.x.x:38814 (ESTABLISHED)
node    8800 my_user   19u  IPv4 906622      0t0  TCP 172.x.x.x:3000->x.x.x.x:56743 (ESTABLISHED)
node    8800 my_user   20u  IPv4 907221      0t0  TCP 172.x.x.x:3000->x.x.x.x:46897 (ESTABLISHED)
...

I am a beginner with node.js, but it looks like it's unable to dismiss already completed connections, which eventually leads to EMFILE and crashes.

I already tried the following:

  • ulimit -n 2048: this is obviously a temporary solution, it delays the EMFILE errors but doesn't solve the issue
  • lowering the response timeout (which is 2 minutes by default, if I recall correctly) to something closer to 5-10 seconds

With both of those adjustments in place, the server takes MUCH longer to crash, but still does so eventually. Even without any load, it seems unable to dispose the "stuck" TCP ESTABLISHED connections, and when the requests start arriving again, the number of file descriptors opened keep growing and eventually crashes the process.

My node.js server (in coffeescript) looks like this (I'm using mimosa to start up the server, but I don't think it should make any difference):

express = require 'express'
engines = require 'consolidate'

fs      = require 'fs'
http    = require 'http'
https   = require 'https'

options =
    ca: fs.readFileSync __dirname + '/ssl/ca.pem'
    key: fs.readFileSync __dirname + '/ssl/key.pem'
    cert: fs.readFileSync __dirname + '/ssl/cert.pem'

exports.startServer = (config, callback) ->

    app = express()

    app.configure ->
        app.set 'port', config.server.port
        app.set 'views', config.server.views.path
        app.engine config.server.views.extension, engines[config.server.views.compileWith]
        app.set 'view engine', config.server.views.extension
        app.use express.logger({ format: ":date :method :remote-addr :url :response-time" })
        app.use express.favicon __dirname + '/public/favicon.ico'
        app.use express.bodyParser()
        app.use express.methodOverride()
        app.use express.compress()
        app.use express.static(config.watch.compiledDir)
        app.use config.server.base, app.router

    app.configure 'development', ->
        app.use express.errorHandler()

    app.get '/my/route/n1', (req, res) ->
        res.render "./my/template/n1"
    app.get '/my/route/n2', (req, res) -> # route getting the bulk of requests
        res.setTimeout(10000) # timeout introducted attempting to fix the problem
        res.render "./my/template/n2"
    app.get '/my/route/n3', (req, res) ->
        res.render "./my/template/n3"
    app.get '*/?', (req, res) -> res.render 'index'

    server = https.createServer options, app
    server.listen config.server.port, ->
        console.log "Express server listening on port %d in %s mode", server.address().port, app.settings.env

    callback server

I think node.js shouldn't have any problem serving this amount of requests, so I think it's a misconfiguration on my part or something along those lines. What I'm doing wrong? Thank you!

P.S.: I edited out a bunch of stuff from the code/output of lsof both for privacy concerns and also because it should be irrelevant for the issue; however, if any other info is needed, I'll update the question to provide it as soon as possible.

EDIT: I think I found the source of my issue. The connection that Express uses to serve ./my/template/n2 is indeed timing out after 10 seconds, however the connections used by express.static to serve images, css and other static resources are not (well they are, but they take 2-5 minutes to release their file descriptor...). I guess my question then reduces to: how to set the response timeout for the files served by express.static? I tried using app.use express.timeout(10000) before every other middleware, but it only seems to work for the main JADE file and not for images or css.

I am using Express 3. Thank you again in advance.

pmarchezz
  • 81
  • 6
  • Are you closing all outgoing responses ? – user568109 Jun 17 '14 at 06:57
  • Do I need to do so? In the examples on the Express api reference there isn't anything that looks like a response close: http://expressjs.com/4x/api.html#app.VERB – pmarchezz Jun 17 '14 at 07:00
  • @pmarchezz Perhaps that's not an error at all. Maybe these connections are still valid due to *keep-alive* strategy. Try setting `Connection` header to `close` (I think that it is `keep-alive` by default). Also do you have anything in front of Node.js like nginx? – freakish Jun 17 '14 at 07:04
  • I thought about doing so, but wouldn't it slow down the page load for the clients? They would need a new conection for each CSS file/image instead of reusing the same one. The `res.setTimeout()` line was added exactly to prevent `keep-alive` behavior to keep opened too many connections, which is in my understanding a better solution than just setting the header to `close` – pmarchezz Jun 17 '14 at 07:09
  • Or that your file-limit is too less for the clients you are getting. Try increasing file-max limit. Rule of thumb is check for descriptor leak before you increase the limits. – user568109 Jun 17 '14 at 07:12
  • Yes, I could do that, but the number of file descriptors seem to keep growing no matter what; with a timeout of 10 seconds and ~30req/sec, I'd expect it to saturate somewhere near 300 connections, which is about ten times smaller than the 2048 limit I put in place. I don't think it would solve my issue, although I'll give it a try anyway. – pmarchezz Jun 17 '14 at 07:27
  • ulimit sets total limit for current shell. Are you running anything else. And try running server without modifying it. – user568109 Jun 17 '14 at 08:18
  • The server is running inside a `screen`, so in the shell as well (yeah, I know, it should be a service, I'll do that as soon as I have a bit of time). I'm sure `ulimit` worked since `lsof -i -n -P | grep node | wc -l` reaches 2000+ rows before EMFILE starts to be an issue. About running without modifying `ulimit`, well...I was doing that before noticing crashes, so I don't think it will fix anything. – pmarchezz Jun 17 '14 at 08:28
  • @pmarchezz I think you are worried too much about performance. And you didn't even check how it will behave with `Connection: closed`. I doubt you'll get any noticable slowdown (especially since `keep-alive` is better for many small requests, not for big requests like images). And one more time: are you using anything in front of Node.js? Correctly configured nginx may solve all your problems. And don't run stuff in screen, use proper daemonization, for example [forever](http://blog.nodejitsu.com/keep-a-nodejs-server-up-with-forever/). – freakish Jun 17 '14 at 11:43
  • @freakish thanks for your advices, I definitely will turn the server into a daemon. On using nginx, no, I am not using anything in front of node; while it may be useful to solve my issue (I actually doubt it, but who knows), I think introducing an additional layer that doesn't actually do much besides proxying should not be necessary, especially since I'm not facing 1k+ reqs/sec or something like that. Also, I don't know how nginx could help node.js to get rid of those lingering connections (after two days of zero load, it still has ~500 file descriptors opened). – pmarchezz Jun 18 '14 at 17:30

1 Answers1

3

The issue appears solved after adding this middleware before every other app.use call:

        app.use (req, res, next) ->
            res.setTimeout(10000)
            next()

I doubt it's the most elegant way to solve the issue but it's working fine right now.

pmarchezz
  • 81
  • 6