AWS Elasticache Tomcat failover - preference for primary Tomcat?

Question

I am running my app using an AWS setup - Tomcats running behind an Elastic Load Balancer (ELB) on EC2 instances, a memcached-backed Elasticache with a couple of nodes. I've configured my web application to store session data in Elasticache in order to keep the Tomcat servers highly available to the user (i.e. when a Tomcat shuts down, the user isn't logged out, but their requests merely get served by another available Tomcat). It works as expected, except for one interesting case I noticed during testing.

When I shut down the Tomcat that is currently running the app, after a moment, another Tomcat starts serving the requests and the user stays logged in. However, when I restart the stopped Tomcat, the app switches back over from its current Tomcat to run on the previously stopped instance, which is not what I'd expect - I thought that the app would continue running on its new Tomcat until it was stopped, and then it would try to switch again.

I've looked around for an explanation for this behaviour, and some sources have indicated it could be an ELB configuration problem, but fail to mention what configuration option could be causing this "preferential primary" treatment. My ELB is currently configured for sticky sessions, and uses the AppCookieStickinessPolicy with a cookieName of JSESSIONID. So far, all my Tomcats reside in the same availability zone (us-east-1b). Any ideas? Is this stickiness behaviour typical?

EDIT: Amazon's documentation here: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-sticky-sessions.html seems to directly refute the behaviour I've observed.

If an instance fails or becomes unhealthy, the load balancer stops routing requests to that instance, and chooses a new healthy instance based on the existing load balancing algorithm. The load balancer treats the session as now "stuck" to the new healthy instance, and continues routing requests to that instance even if the failed instance comes back.

score 0 · Answer 1 · answered Oct 04 '16 at 18:41

I have observed this behavior for years on a production system I manage that does not reside on amazon. I think this is the default behavior of memcached-session-manager tomcat uses for session replication to a remote cache.

The exact same behavior occurs when I loose a tomcat node. Tomcat goes to memcached for the session and allows customer interaction smoothly to another box, and when the original box goes back up, the user goes back to the box they were on.

I couldn't tell you if it is configurable, but we run a minimally configured setup here and that appears to be the default behavior of MSM.

AWS Elasticache Tomcat failover - preference for primary Tomcat?

1 Answers1