1

I have an Amazon Web Services setup with an Apache instance behind Nginx with Nginx handling SSL and serving everything but the .php pages. In my ApacheBench tests I'm seeing this for my most expensive API call (which cache via Memcached):

100 concurrent calls to API call (http): 115ms (median) 260ms (max)
100 concurrent calls to API call (https): 6.1s (median) 11.9s (max)

I've done a bit of research, disabled the most expensive SSL ciphers and enabled SSL caching (I know it doesn't help in this particular test.) Can you tell me why my SSL is taking so long? I've set up a massive EC2 server with 8CPUs and even applying consistent load to it only brings it up to 50% total CPU. I have 8 Nginx workers set and a bunch of Apache. Currently this whole setup is on one EC2 box but I plan to split it up and load balance it. There have been a few questions on this topic but none of those answers (disable expensive ciphers, cache ssl, seem to do anything.) Sample results below:

$ ab -k -n 100 -c 100 https://URL 
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking URL.com (be patient).....done


Server Software:        nginx/1.0.15
Server Hostname:        URL.com
Server Port:            443
SSL/TLS Protocol:       TLSv1/SSLv3,AES256-SHA,2048,256

Document Path:          /PATH
Document Length:        73142 bytes

Concurrency Level:      100
Time taken for tests:   12.204 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      7351097 bytes
HTML transferred:       7314200 bytes
Requests per second:    8.19 [#/sec] (mean)
Time per request:       12203.589 [ms] (mean)
Time per request:       122.036 [ms] (mean, across all concurrent requests)
Transfer rate:          588.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       65  168  64.1    162     268
Processing:   385 6096 3438.6   6199   11928
Waiting:      379 6091 3438.5   6194   11923
Total:        449 6264 3476.4   6323   12196

Percentage of the requests served within a certain time (ms)
  50%   6323
  66%   8244
  75%   9321
  80%   9919
  90%  11119
  95%  11720
  98%  12076
  99%  12196
 100%  12196 (longest request)
  • 1
    How's the entropy looking? `cat /proc/sys/kernel/random/entropy_avail` – Shane Madden Sep 21 '12 at 21:35
  • Sorry, this is a bit over my head. Every time I call entropy_avail I get ranges from 128 to 180, many times it repeats. For those that want to understand what Shane is talking about: "Cryptographic applications such as SSL/TLS block when /dev/random is out of random numbers thus interrupting traffic between the server and its clients." – Mauvis Ledford Sep 21 '12 at 22:09
  • 1
    Is that info gathered during the load testing, or while idle? You'll want to check it during the load test - sorry that I didn't mention that initially! – Shane Madden Sep 22 '12 at 07:01
  • Thanks Shane. I ran `ab -k -n 1000 -c 100` which takes a good 2 minutes to run while running `cat /proc/sys/kernel/random/entropy_avail` a bunch of times and got numbers between 129-175. I assume 0 would signify it is out of numbers? – Mauvis Ledford Sep 22 '12 at 07:35
  • 1
    Yeah, it ought to be getting closer to 0 if entropy were the problem. Not sure if this will help, but let's try a less expensive algorithm - try putting `RC4+RSA` first in your SSL cipher list, so that it will be preferred? – Shane Madden Sep 22 '12 at 15:22
  • Hi Shane, a little improvement - it made 100 requests slightly faster (59s seconds instead of 59.3s total time) but still generally slow. I took a screen shot for you of the diff between the two AB responses here: https://img.skitch.com/20120922-n8buhp7ji286jepsw1x2fq5y5m.jpg I ran a test and see Facebook and Github are also using RC4+RSA so I'll def keep using it! Any other ideas come to mind? You've been incredibly helpful and I can't thank you enough. – Mauvis Ledford Sep 22 '12 at 17:44
  • 1
    Interesting.. let's rule out anything other than "slow SSL" - create a static document of around the same size of the response to your API calls to benchmark against? – Shane Madden Sep 22 '12 at 18:50
  • I did what you said learned that neither Nginx nor Apache bottlenecked with straight JSON files. Which led to me finding the problem: Rate limiting. I had set up a Nginx `limit_req_zone` previously for the API `rate=500r/m;` for a burst of 100. I don't think I ever got near this in my testing but disabling it solved the bottle neck. All those other optimizations above including using a less expensive cipher were still very helpful. Thanks so much and I owe you a few beers next time you're around San Francisco. Followed you on Twitter @krunkosaurus – Mauvis Ledford Sep 22 '12 at 22:44
  • 1
    Nice, good find! I'll have to take you up on that next time I'm out there! – Shane Madden Sep 23 '12 at 00:57
  • 1
    @MauvisLedford If you use `limit_req` without the `nodelay` flag it perfectly explains the result, as it will try to delay requests to match the rate set, see [docs](http://nginx.org/r/limit_req). – Maxim Dounin Sep 23 '12 at 19:31

1 Answers1

0

the issue ended up being that I had set Nginx rate limiting early on and it was affecting my load tests. Foolish I know. One interesting thing during rate limiting is that Nginx doesn't immediately get 503 errors past the burst threshold but just delays the responses. Only when there are too many requests in queue will it send 503's.