1

I am trying to measure the throughput of a simple Node.js program with a CouchDB backend using cradle as the DB driver. When I put load against the program I get the following error within 30 seconds:

EADDRINUSE, Address already in use

Here is my program:

var http = require ('http'),
    url = require('url'),
    cradle = require('cradle'),
    c = new(cradle.Connection)('127.0.0.1',5984,{cache: false, raw: false}),
    db = c.database('testdb'),
    port=8081;

http.createServer(function(req,res) {
    var id = url.parse(req.url).pathname.substring(1);  
    db.get(id,function(err, doc) {
      res.writeHead(200,{'Content-Type': 'application/json'});
      res.write(JSON.stringify(doc));
      res.end();
    });
}).listen(port);

console.log("Server listening on port "+port);

I am using a JMeter script with 50 concurrent users. The average response time is 120ms, average size of the document returned 3KB.

As you can see I set the caching of Cradle to false. To investigate I looked at the number of waiting sockets: It increases up to about 4000, at which point it crashes (netstat | grep WAIT | wc -l)

To test other options I set the caching to true. In this case the program doesn't crash, but the number of waiting sockets increases to almost 10000 over time.

I also wrote the same program (sans the asynchronous part) as a Java Servlet, and it runs fine without the number of waiting sockets increasing much beyond 20.

My question is: Why do I get the ' EADDRINUSE, Address already in use' error? Why is the number of waiting sockets so high?

P.S.: This is a snippet from the output of netstat|grep WAIT:

tcp4       0      0  localhost.5984         localhost.58926        TIME_WAIT
tcp4       0      0  localhost.5984         localhost.58925        TIME_WAIT
tcp4       0      0  localhost.58924        localhost.5984         TIME_WAIT
tcp4       0      0  localhost.58922        localhost.5984         TIME_WAIT
tcp4       0      0  localhost.5984         localhost.58923        TIME_WAIT
MarcFasel
  • 1,080
  • 10
  • 19
  • I am still not even sure if TIME_WAIT is a major clue, or a red herring, or something in-between. Looking forward to your update with a `nano` or `request` couch client. – JasonSmith Sep 12 '11 at 21:07
  • I tested the program using nano as the driver and after short the EADDRINUSE came back in the err object. I modified the code to just report a HTML error 500 back to the client, and now it runs fine with nano. Turns out only I get about 10 EADDRINUSE per 100000 requests, so this is negligible. – MarcFasel Sep 13 '11 at 00:17
  • Cradle behaved differently than Nano: Cradle throws an exception that needs to be handled with a top-level uncaughtException event handler. If I do that the application keeps running, but after 5 (!) of those exceptions the whole application stops responding. – MarcFasel Sep 13 '11 at 06:50
  • would you mind pasting the exception to gist or pastebin and tweeting the link to `@_jhs` or Freenode IRC to `JasonSmith`? I believe throwing the exception is a bug and I would like to try to get it fixed. Thanks! – JasonSmith Sep 13 '11 at 16:43
  • I just updated to the latest version of Cradle 0.5.6 and it seems like the bug is fixed. – MarcFasel Sep 14 '11 at 23:29
  • Wow! Cool. I updated my answer in case anybody else gets this problem. – JasonSmith Sep 15 '11 at 03:26

2 Answers2

2

Upgrade to Cradle 0.5.6. It does not have the problem.

Speculation about the problem

The waiting sockets are probably in the CLOSE_WAIT state. (There are other states that would match your grep, such as TIME_WAIT. Can you confirm that it is CLOSE_WAIT and not anything else?)

The linked post has a helpful quote:

RF793 says CLOSE_WAIT is the TCP/IP stack waiting for the local application to release the socket. So, it hangs because it has received the information that the remote host has initiated a disconnection and is closing its socket, upon what the local application did not close its own side.

So maybe the solution consists in finding a bug fix for your application...

Indeed. In your case, there are two connections per query, one from JMeter to Node, and another from Node to CouchDB. Either JMeter (older more mature software) is not closing the connection properly, or Cradle (newer, less mature software) is not closing the connection properly. Obviously, Cradle is the most likely to have the bug. (Perhaps it is NodeJS's HTTP library itself, but Cradle seems like the first place to check.)

I do not have a complete answer, but hopefully these will be helpful clues. I think the address-in-use error is because there are no more source addresses to make an "outgoing" (even for 127.0.0.1) connection. But I am so far unsure why the CLOSE_WAIT count is different in each trial. (Perhaps it is fluctuating heavily as entire connection pools are closed.)

To gain more information, perhaps try an alternative CouchDB client library such as request or nano and compare the results.

Please us know what you find because it would be great to identify and close this potential Cradle bug (or bug somewhere at least!). Thanks.

JasonSmith
  • 72,674
  • 22
  • 123
  • 149
  • Thanks for the quick reply. They are all in TIME_WAIT state. I added a snippet from the netstat | grep WAIT output to the question above. – MarcFasel Sep 12 '11 at 01:25
  • I think the idea of using a DB driver other than cradle sounds most promising. I will try that. – MarcFasel Sep 12 '11 at 01:32
  • Are you on Linux? If so, see if you can set the config `[httpd] bind_address = 0.0.0.0` and then **rotate through** 127.0.0.1, 127.0.0.2, 127.0.0.3, etc. Does it change the results? I wonder if you are exhausting all of the possible destination address+port combinations. On Linux, `127.*.*.*` will all go to the same target (localhost) and the source and dest address will be the same. – JasonSmith Sep 13 '11 at 16:51
  • 1
    Uh oh. From your netstat output, you (just like me) are using the "dumb blonde" Unix: Mac OS X. :) Looks like you can set up aliases to get 127.0.0.2, etc. working by using this procedure: http://www.artin.org/geekblog/2011/02/mac-os-x-adding-a-loopback-alias/ but unfortunately the *source* address on my mac is still 127.0.0.1 which won't help much. I am unable to determine at this time how to tell Node how to bind to a different source address. Still thinking... – JasonSmith Sep 13 '11 at 17:07
2

Are you sure you don't have a zombie process on 8001?

    ps aux | grep node

might help

Also wrote an article to help people get started with node and couchdb, if you are interested you can check out http://writings.nunojob.com/2011/09/getting-started-with-nodejs-and-couchdb.html

dscape
  • 2,506
  • 1
  • 22
  • 20
  • No zombie process, just needed to add a little exception handling to my code. I tried Nano and it handles errors slightly better than Cradle, as it passes the EADDRINUSE error in the err object. Cradle throws an exception that needs to be handled with a top-level uncaughtException event handler. – MarcFasel Sep 13 '11 at 00:27