We recently migrated our site to a load balanced apache cluster behind varnish. Since that time a very small subset of users is reporting they cannot view any pages. I have narrowed the issue down quite a bit. This issue was not present before the move, the old infrastructure was a single large box.
We are on Rackspace Cloud running 8 apache2 instances behind varnish 3.0 all load balanced using Rackspace Cloud load balancers (Zeus) and 2 mysql instances for a total of 10 servers, all linux.
User can view a static html file. User can view a static asset such as an image. User cannot view any php file, even a simple one which only includes phpinfo(); User cannot view any php file when the load balancer is taken out of the picture.
The apache logs show nothing of note, other than in the access logs. PHP error reporting is set to log, and not display, although I set it to display for a short time, and the user still gets a blank page without an error. Apache/Varnish/PHP error logs show nothing of note.
Servers are:
- Ubuntu Maverick 10.10
- Apache 2.2.16-1ubuntu3.1 (mpm-worker)
- PHP 5.3.3-1ubuntu9.5 (used in fcgi)
- PHP APC is in use
- Application is on Code Igniter
- Varnish was 2.1.3, now 3.0.0 - issue was present with both versions
- MySQL is the database backend in a master-master setup but due to the client access issues with a file only containing phpinfo(); I am sure the database is not the issue.
Snapshots of some configurations:
- PHP FCGI - http://pastebin.com/6cepWbxp
- Apache Virtual Host - http://pastebin.com/FfxhYwSD
- Varnish VCL - http://pastebin.com/tAcuyfLR
- List of all apache modules running - http://pastebin.com/absHpXm5
I can provide any/all logs needed to further debug, but there is nothing of note in them for the users having this issue, typical access from apache, no errors from php.
I have a feeling it might somehow be relate to php session storage although I cannot confirm this.
Any insight into the problem is greatly appreciated. Just to reiterate one final time, this issue only affects a very small handful of users. 5-10 have contacted us about the issue but I assume the number is larger than that of people who have not bothered to report the issue. These 5-10 users who have contacted us span various continents/countries/isps.