3

I have a Debian Squeeze dedicated server which hosts 4 front-end websites, and several back-office tools and APIs (which feed into the main websites).

This has all been up and running since 2008 without issue - but suddenly today we're experiencing lots of 110: connection timed out errors when trying to connect to the APIs hosted on the box.

The apps are PHP / mysql based and the server software is Apache.

Each of the domains called for API call is mapped to 127.0.0.1 in the /etc/hosts file and it appears from testing on the command line with both curl and wget that the actual DNS lookup is resolving fine. Connections work maybe 1 in 3 times.

We've checked and increased (as a precaution) the Apache max_connections. Likewise mysql connection limits have been increased - but neither of these limits were even getting close to being reached.

The timed out test requests don't even touch on the Apache error/access logs - it appears like Apache is just not responding to certain requests.

The server load itself never goes above 0.6.

iptable rules have not changed since yesterday (when this worked) and are allowing internal connections to and from 127.0.0.1.

To bypass an php/rewrite rules etc I have tried requesting a simple image from a sub-folder on the command line. In tests this is returned roughly 1 in 3 times. Each other time it fails.

Can anyone suggest what else to look at next?

------------------ UPDATE -----------------

It seems the server is closing port 80 intermittently. IP tables has no rules to do this...

Any ideas?

steve
  • 153
  • 1
  • 1
  • 9
  • 5 years is commonly cited as the expected life cycle of hardware, at least the beancounters often use that as the depreciation period. Running memtest might be a good idea. If your uptime is high a reboot might work well enough as a mitigating measure. – HBruijn Feb 07 '14 at 12:51
  • Apologies should have been clear - initially it was hosted on a VPS, now on dedicated hardware which is only 18months old. – steve Feb 07 '14 at 13:43

2 Answers2

1

Bizarely we discovered using a port scan tool that port 80 was opening and closing every few seconds - we couldn't track down a cause. Firewall/iptables were disabled, apache was configured to accept connections on Port 80....

We attempted to reinstall Apache and it wouldn't uninstall (appeared corrupt) - ultimately we ended up reinstalling the OS to resolve.

steve
  • 153
  • 1
  • 1
  • 9
0

I would enable Apache status module, see sample at http://www.apache.org/server-status and ensure the connections do not stay in one of following states for so long:W,R,C.

Also is there any change done to the application or the environment?

antimatter
  • 229
  • 1
  • 7
  • Configuration hasn't been changed - only automated update has been a mysql security patch installed by APT, but regressing this made no difference. Application untouched for months now. The server status page shows only 8 active connections, and a further 8 idle workers. Nothing seems to be sticking... – steve Feb 07 '14 at 14:08
  • Is reboot possible? Worth trying. Like the other person suggests, check memory and HDD. How about telnet using IP and mapped domain name? Is it always success? – antimatter Feb 07 '14 at 14:16
  • Rebooted a few times without success now. Memory, HDD all seem fine and plenty of each available. The server is running very much under capacity really. Telnet fails most of the time, but works occasionally. – steve Feb 07 '14 at 15:26
  • Memory and HDD not in terms of free space, but in terms of hardware failure. Memtest, chkdsk etc. I wouldn't suspect MySQL if simple telnet fails. – antimatter Feb 07 '14 at 17:58
  • Hi - disk and ram check out in the data centres tests... we have got closer I think to the problem, if not the solution. See updates above - appreciate any input... – steve Feb 07 '14 at 18:15
  • It looks like Apache was tampered by either hacking or auto patching. Are you absolutely sure there is no configuration changes or module changes? – antimatter Feb 08 '14 at 01:18