I am load testing different options for load balancing, and am getting poor results from Nginx, haproxy, and varnish. I have one 4GB load balancer at Rackspace, hitting 4x1GB app servers.
I'm hitting a url called "/slow" which deliberately waits 500ms before responding. If I hit an application server directly it can handle a connection rate of 1600-1800 per second.
If I hit the Nginx load balancer, it can only handle about 2000 connections. I was hoping for something closer to 4x1600 = 6000. Below is the command I'm using to test it. This is run in parallel on 40 256 MB instances. I'm deliberately setting num_call to 1, because I want to see connection performance. Any higher than this and I start to get a lot of errors.
httperf --server 50.56.80.227 --port 1555 --uri /slow --rate 50 --num-call 1 --num-conn 100 --timeout 5
Here is my nginx config: https://gist.github.com/1299501
So, here's the weird thing, no matter whether I use nginx, haproxy, or varnish, I get roughly the same results. However, I tested Rackspace's new cloud balancers, and they get much better performance (doing fine at 7000/s). Since nginx and the others are all running on an instance I set up, and the rackspace balancer isn't, I'm guessing there's something about the system that is wrong. I'd rather use a balancer I control, so I can add caching, gzip, ssl, and other stuff to it.
How can I find out what the bottleneck is? Is there anything I should tweak on the system to get performance to be better? Do I need more than 4GB of ram? (Ram usage is not high during the test). Any other random ideas?
Update: I just resized the balancer to 8GB and it is performing a lot better, up to 6000-7000, or comparable to the rackspace balancers. This doesn't make any sense, because it wasn't running out of RAM before.
Update: Here is an example of output from httperf when I overload the balancer (on the 8GB version, so higher than I was able to go before, but the errors ar esimilar): https://gist.github.com/1299628