7

I am load testing different options for load balancing, and am getting poor results from Nginx, haproxy, and varnish. I have one 4GB load balancer at Rackspace, hitting 4x1GB app servers.

I'm hitting a url called "/slow" which deliberately waits 500ms before responding. If I hit an application server directly it can handle a connection rate of 1600-1800 per second.

If I hit the Nginx load balancer, it can only handle about 2000 connections. I was hoping for something closer to 4x1600 = 6000. Below is the command I'm using to test it. This is run in parallel on 40 256 MB instances. I'm deliberately setting num_call to 1, because I want to see connection performance. Any higher than this and I start to get a lot of errors.

httperf --server 50.56.80.227 --port 1555 --uri /slow --rate 50 --num-call 1 --num-conn 100 --timeout 5

Here is my nginx config: https://gist.github.com/1299501

So, here's the weird thing, no matter whether I use nginx, haproxy, or varnish, I get roughly the same results. However, I tested Rackspace's new cloud balancers, and they get much better performance (doing fine at 7000/s). Since nginx and the others are all running on an instance I set up, and the rackspace balancer isn't, I'm guessing there's something about the system that is wrong. I'd rather use a balancer I control, so I can add caching, gzip, ssl, and other stuff to it.

How can I find out what the bottleneck is? Is there anything I should tweak on the system to get performance to be better? Do I need more than 4GB of ram? (Ram usage is not high during the test). Any other random ideas?

Update: I just resized the balancer to 8GB and it is performing a lot better, up to 6000-7000, or comparable to the rackspace balancers. This doesn't make any sense, because it wasn't running out of RAM before.

Update: Here is an example of output from httperf when I overload the balancer (on the 8GB version, so higher than I was able to go before, but the errors ar esimilar): https://gist.github.com/1299628

Sean Clark Hess
  • 273
  • 3
  • 13
  • Please show us the actual output of your command. By the way, parallel 40 instances * 50 calls per second (your config values) == 2000 requests per second. Maybe that's your problem? – Jeff Ferland Oct 19 '11 at 20:38
  • 2000 is what I intended. What do you mean? I can get ~1700 connections per app server, so I should get roughly 4x that with 4 app servers, unless the balancer is the bottleneck. It handles 2000, but can't handle much more than that. – Sean Clark Hess Oct 19 '11 at 20:54
  • The conn reset sure sounds like an ephemeral ports/timewait problem. Are you trying to test this from a single IP/host? – polynomial Oct 20 '11 at 03:52
  • No, I distribute the tests between 40 256MB servers. Is there something I can check to see if that is the problem? – Sean Clark Hess Oct 20 '11 at 16:28
  • What is your system maximum number of open file descriptors for nginx? Try bash's "ulimit -n" under nginx user. Or under regular user to see the default. – Sergey Oct 26 '11 at 18:58

1 Answers1

4

I am also on Rackspace Cloud and I have a very similar issue. I believe the problem is this:

Rackspace Cloud Server FAQ

It would appear from what you are describing you are simply maxing out the pitiful amounts of bandwith rackspace gives us, almost entirely making all of the amazing performance gains things like Varnish/Nginx provide.

To confirm, re-run some of your benchmarks with iftop open, and watch it totally cap out to the amounts rackspace provides at each server size.

WerkkreW
  • 5,969
  • 3
  • 24
  • 32