I'm testing out various Ruby on Rails hosting solutions, including nginx, apache, a couple of various ISPs and cloud computing systems, etc.
I'm noticing that, when there's only one or two simultaneous requests being handled, the average response time for those requests is often tiny (<10ms). However, I can only handle so much traffic like that. But if I am trying to maximize the number of requests per second, the average response time grows quite quickly. For instance, one one server, I found that the greatest number of requests/second was reached at around 16 simultaneous requests going on at any one moment. However, at this point, the average response time was over 200ms.
I wonder, what tricks and tips do you web server gurus have to balance between response time and requests per second?