2

I have a ~1.5-2mil page views per day site running on 2 servers. One for mysql, other for everything else. Mysql box has a load of 3, frontend is usually 0.0-0.1. Both are dual quad core with 8GB ram running SAS drives in raid5. CPU is idle for majority of the time, iowait is non-existent.

Im running nginx, memcache, and site is built on php. Half the time everything runs perfect, while at other times it lags something severe, when it takes 10-15 seconds for a page to load. Page execution time is always super low, but it seems to hang, waiting for something before it actually loads the page. Whats even more weird is that it only happens to 1 file on the site (but its the one thats most commonly accessed, that actually loads the content on the site). Other pages are super fast at all times, even when it takes 15 seconds to load actual content.

I have nginx_stats plugin installed, and if I monitor it, the lag spikes happen when the write column starts going above 100, and it frequently does... all the way to 500-1000.

It does so at totally random times... not when traffic is heavy... it can do this in the middle of the night, and work perfectly at 5pm when traffic is at its highest.

Any ideas?

1 Answers1

0

Without knowing more, my first inclination is that your database server is the cause of the problem. Check the load, IO wait, memory/swap utilization and the other usual suspects on the DB server to ensure that there it no resource contention on the server.

Try to determine if there is a certain query that is killing the box, turn on the slow query log and let it go through a period of slowness and look at that for any queries that can be optimized.

You said that the "write column" is high, is that indicative of writes to the database? Writes are typically more expensive than reads, and you may want to setup a replication slave that you can offload some or all of the reads to.

My guess is that once you turn on the slow query log you'll find the culprit query that is killing the database and slowing everything down.

d34dh0r53
  • 1,781
  • 11
  • 11
  • Thats a negative. Write column is php writes. Mysql load is low, and for 90% of page loads, there is no mysql connection at all, its entirely memcache. –  May 11 '10 at 15:44
  • Write column is actually nginx writes. Assuming you are using the nginx stub status module. Then Read is accepting a connection, writing is outputting to client and waiting is keep alive connections just waiting. Writing being high is most likely an indicator that nginx is waiting on PHP to finish. How do you measure your page load time? – Martin Fjordvald May 11 '10 at 16:19
  • Just a standard execution time timer from the beginning to the end. Its always under 0.1 seconds (usually 0.04 or something). What could be causing it to wait? wait column also spikes sometimes from the standard ~300-400 to 1500 too, at random times. –  May 11 '10 at 17:44
  • I would suggest trying something like the xhprof extension from facebook to determine the real execution time. Also, if you're using php-fpm then turn on the slow log, that will give you a call stack of slow requests. This should determine if it's php or nginx being the problem. – Martin Fjordvald May 11 '10 at 18:23