0

Servers are accessible normally. Checking /

(default page)

We will have some sort of load, these will respond a little slower than it likes. Then Take down the Instances from our load balancer.

Because the Application doesnt fail, I cannot "reboot" the instances via Ec2. I can often access the webpage / IP direct myself when it's "out of service"

This isn't a general failure or a misconfiguration, it can be up for 12-2400 hours, but then randomly fail 3x in 3 hrs. Under medium-low load.

Server set to 10s response timeouts, 30s intervals, 5x to Fail; 2x to say its ok.

Any ideas?

Health check logs are responding normal, and nothing in ERRORS. Heres a sample from access:

10.0.100.30 - - [25/Nov/2016:06:49:22 +0000] "GET /index.html HTTP/1.1" 200 11415 "-" "ELB-HealthChecker/1.0" ::1 - - [25/Nov/2016:06:49:26 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.20 (Ubuntu) (internal dummy connection)"

Nick
  • 1,208
  • 11
  • 26
  • What is the health check configuration on your load balancer? – Mahdi Nov 25 '16 at 08:47
  • Did you read your app server's web server access logs? You should be seeing the ELB checks, and you should be able to account for the health check failures based on what you find there. – Michael - sqlbot Nov 25 '16 at 18:13
  • @Michael-sqlbot Would this appear in standard apache access.logs ? – Nick Nov 25 '16 at 18:32
  • @Michael-sqlbot if so, All i see in access.log is the pile of 200 requests. Health check usually sends 2 requests; both respond with 200, then some ::1's ---- updated to show output of checks – Nick Nov 25 '16 at 18:39
  • You should be seeing these at predictable intervals, consistent with the ELB health check timing configuration, one from each ELB node (identified by the node's IP address -- typical ELB setups have 2 or 3 nodes, which is something ELB decides automatically, based on traffic). So the question is two-fold: do you see the *expected number of checks at the expected (configured) arrival interval*, as well as *do they all succeed within the allowed timeout* during the time ELB is declaring the instance unhealthy? – Michael - sqlbot Nov 25 '16 at 18:55
  • Yes, they do. The only "problem" is we have had some slow MYSQL response times. Since this is an HTML page, outside MYSQL, i doubt this will be a problem. However, it almost seems that a bottleneck is created somehow. The other issue is its completely random. ELB decides to fail, when the server is up and fine. we are load balancing mediums of a simple webapp with a RDS 2xl Mysql setup, and somehow I feel we have had more downtime as a result of fault tolerance glitches than a single server with mysql installed locally. – Nick Nov 26 '16 at 06:03

0 Answers0