3

I am trying to debug why my server has 3 seconds delay on each request.

enter image description here

These are my nginx and unicorn config files: https://gist.github.com/regedarek/de7f2e5cd1918b6224ac

My server info:

  Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-37-generic x86_64)

  System load:  0.02              Processes:           71
  Usage of /:   38.9% of 6.89GB   Users logged in:     0
  Memory usage: 28%               IP address for eth0: 149.156.119.151
  Swap usage:   0%

server:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 1.0
stepping        : 3
microcode       : 0x1000065
cpu MHz         : 2300.026
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up nopl pni cx16 popcnt hypervisor lahf_lm svm abm sse4a
bogomips        : 4600.05
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

This is output from tcpdump -qtln port 80 -> https://gist.github.com/c21d3a38e1d75fb2c4f1

tomekfranek
  • 143
  • 6
  • 1
    This could use some more information. Is it a consistent delay with every new HTTP request, or it only with the first request from a new host? Does the delay length vary much? Also: If you can, run `tcpdump -qtln port 80` and capture a couple of requests. Add the results to your OP. It'll help show where the delay is at the network level. – SmallClanger Feb 14 '13 at 12:30
  • This delay occurs with every request and takes between ``1.5-3`` sec. I added output from ``tcpdump`` – tomekfranek Feb 14 '13 at 12:56
  • The tcpdump output does not contain any timing information, could you rather try to either remove the t option or set it to tt or ttttt (5xt) `tcpdump -qttln port 80` or `tcpdump -qtttttln port 80`? – Huygens Feb 14 '13 at 13:46
  • Sure. Just updated Gist. – tomekfranek Feb 14 '13 at 14:10
  • OK so I guess the 2-3 sec delay is visible in the tcpdump after the first handshake, is that correct? Could you try to run in the guest as root `perf top -a`? It is updated in real time, during the initial 2-3 sec delay try to look at the processes that are at the top of the displayed list, they could hint you towards the bottleneck. – Huygens Feb 14 '13 at 14:41

1 Answers1

3

To debug this, I would systematically isolate each component to find the bottleneck. At all times I would tail -f any log files, including nginx access/error logs and the rails logs for errors.

The first question is, is it slow in production only, or also in development mode? The answer will help you diagnose by focusing on what's different between the environments.

Start the unicorn process in standalone mode and connect on port 8080, eliminating nginx. Is it still slow? If yes, it's either a rails or Unicorn issue. I doubt that nginx is the problem because static assets are being served up relatively fast on your site.

Test the app in Webrick to eliminate Unicorn. If it's still slow, start digging into your rails app.

I tested your site with telnet ck.uci.agh.edu.pl 80 doing a GET http://ck.uci.agh.edu.pl/ and the delay was there before any text was printed. That eliminates any javascript or asset loading issues in the browser.

Stuff to try:

  • Test isolated components of your app in rails console by doing model lookups. Go through each component of displaying your main page one at a time until you find the delay. Try running the code that you execute in your index controller action, and anything that your page executes as it is parsed. You're looking for database issues, delays from connections to outside services, etc.

  • Use a trial period of the New Relic service in Pro mode, which can reveal app bottlenecks such as slow database queries or calls to outside services. It shows you everything that executes and how much time it takes.

  • Do a DNS lookup on the command line of your server, such as nslookup somehost.com. Sometimes a bogged-down DNS server can slow up your app if it uses hostnames in any of its communication with other servers such as your database server, geolocation services, etc.

  • The only thing I saw in your config was that you should set your unicorn worker_processes to at least the number of CPUs on your server. Try increasing this number to 4 and see if this helps, but watch for memory usage.

Hope this helps. Good luck!

platforms
  • 1,118
  • 10
  • 23
  • Thanks, It is slow on production not on development, on unicorn too but when I tested app on webrick: ``rails s -e production`` this problem didn't occurs. Now I am trying to figure out why. – tomekfranek Feb 14 '13 at 14:52
  • OK good - so you know it's not merely production mode itself. So focus on each component of your production systems that differ from dev. Then walk through the app components as I describe above, using `rails c` on the production server itself to find the issue. – platforms Feb 14 '13 at 15:03
  • It looks much faster now - well done. How did you fix it? – platforms Feb 14 '13 at 15:28
  • It is faster but there is still 0.8 sec delay between the time the GET is received and acknowledge by the server (at TCP level) and when Nginx reply with the content! There is something wrong still. – Huygens Feb 14 '13 at 15:30
  • Have you done the `rails c` stuff yet on the production server? Try running `app.get '/'` and watch all the selects being run for the main page, for example. If your `index` action does a `cv = Cv.all.where(...)` then run that too, etc. – platforms Feb 14 '13 at 15:48
  • 1
    ``app.get '/'`` returns fast ``200`` and I dont have any ``find,all,where`` in index action. Maybe problem is because of ``localleapp`` I abandon temporary unicorn and I am using nginx + thin server until I fix that issue – tomekfranek Feb 14 '13 at 16:04
  • Yes - localeapp can be a big problem that way. Here is the config I use to ensure it doesn't call home on every hit in production: https://gist.github.com/bashcoder/0b52684b1a924326d060 – platforms Feb 14 '13 at 16:45