StackOverflow has been a way of life for me but this time its a question rather than looking for an answer as I have probably exhausted all options.
Apologies as this will a long description of the issue !
We have an Spring MVC application + Tomcat 7 running on windows 2012 server on AWS .Being a Analytics application invoking heavy duty procedure calls doing statistical calculations in the backed .
With a a high availability requirement I need to setup a cluster .Now with no multicasting on AWS I resorted to two other options .(I must say its my first foray into AWS and Tomcat in a production environment )
1.Static Tomcat Cluster with DeltaManager for session replication 2.Redis Based session replication (Will be a long shot with a windows server and with sticky session )
Starting with Static Tomcat Cluster ,which I did set up with out much fuss and went on to configure Apache Httpd mod_proxy as load balancer .
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8" channelStartOptions="3"><!--startoption 3 added to disable
multicast ,channel send option 8 is for async replication-->
<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false"
notifyListenersOnReplication="true"/>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
address="auto"
port="4002"
autoBind="9"
selectorTimeout="5000"
maxThreads="6"/>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/><!--Added ,This interceptor pings other nodes
sothat all nodes can recognize when other nodes have left the cluster. Without this class, the cluster may appear to work fine, but session
replication can break down when nodes are removed and re-introduced-->
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
<Member className="org.apache.catalina.tribes.membership.StaticMember"
port="4000"
host="localhost"
domain="delta-static"
uniqueId="{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0}" />
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>
The mod proxy httpd.conf configuration with AJP connector with relevant modules uncommented
<Proxy balancer://IOCluster stickysession=JSESSIONID>
BalancerMember ajp://127.0.0.1:8009 route=tcruntime8009 loadfactor=1
BalancerMember ajp://127.0.0.1:8012 route=tcruntime8012 loadfactor=1
</Proxy>
ProxyPreserveHost On
ProxyStatus On
ProxyPass "/IO" "balancer://IOCluster/IO"
ProxyPassReverse "/IO" "balancer://IOCluster/IO"
The mod proxy httpd.conf configuration with HTTP connector with relevant modules uncommented
ProxyRequests Off
ProxyPass /IO balancer://IOCluster stickysession=JSESSIONID
ProxyPassReverse /IO balancer://IOCluster
BalancerMember http://localhost:8092/IO route=tcruntime8092
BalancerMember http://localhost:8091/IO route=tcruntime8091
The load balancer worked in both the cases .The issue was with session replication which wasn't working and I could see no sign of the same in logs .If I shut down one instance the balancer would redirect to the other node but I would see the login page ,which was proof of the same .
As per this 18835014 question I added the tag to the applications web.xml and moved the delta manager tag to context.xml
<Context>
<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false"
notifyListenersOnReplication="true"/>
<!-- Default set of monitored resources -->
<WatchedResource>WEB-INF/web.xml</WatchedResource>
<!--<Context distributable="true"></Context>-->
<!-- Uncomment this to disable session persistence across Tomcat restarts -->
<!--
<Manager pathname="" />
-->
<!-- Uncomment this to enable Comet connection tacking (provides events
on session expiration as well as webapp lifecycle) -->
<!--
<Valve className="org.apache.catalina.valves.CometConnectionManagerValve" />
-->
</Context>
And I could see session replication active on the console .
The issue is now when I log into the application the the page becomes unresponsive despite the queries fired on the application !I can see 504(Gateway timed out )message on the access logs where I see all the get request return successfully .But as soon as the first queries are fired after submit the login page the database queries fire but the application becomes unresponsive .
If I move back the DeltaManager back to inside server.xml the application becomes responsive but without session replication .
Some other tweaks I tried with the the httpd.conf prefork module ,keepalive ,timeout etc after which I see 500 on the access log on the apache server nothing worked . Would really appreciate any help !
<IfModule mpm_prefork_module>
StartServers 10
MinSpareServers 10
MaxSpareServers 20
MaxClients 50
ServerLimit 50
MaxRequestsPerChild 500
</IfModule>
ProxyRequests On
ProxyTimeout 600
<Proxy *>
AddDefaultCharset Off
Order deny,allow
Allow from all
</Proxy>
<Proxy balancer://IOCluster stickysession=JSESSIONID>
BalancerMember ajp://127.0.0.1:8009 min=10 max=100 route=tcruntime8009 loadfactor=1 keepalive=On timeout=600
BalancerMember ajp://127.0.0.1:8012 min=10 max=100 route=tcruntime8012 loadfactor=1 keepalive=On timeout=600
</Proxy>
ProxyPreserveHost On
ProxyStatus On
ProxyPass "/IO" "balancer://IOCluster/IO"
ProxyPassReverse "/IO" "balancer://IOCluster/IO"