I uses EHCache + JGroups to replicate the cache of my webapps on 3 tomcat instances.
<!-- Use jgroups (UDP) to replicate cache among the cluster -->
<cacheManagerPeerProviderFactory
class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
properties="channelName=EH_CACHE_STA::connect=UDP(mcast_addr=229.10.10.10;mcast_port=45567;):PING:MERGE2:FD_SOCK:VERIFY_SUSPECT:pbcast.NAKACK:UNICAST:pbcast.STABLE:FRAG:pbcast.GMS"
propertySeparator="::" />
Sometimes a tomcat instance don't restart. In the jgroups logs I can see :
[webapp] WARN 2012-12-14 15:36:55,784 [GMS] : join(tc-fr-sta-tomcat1-32427) sent to b0dc40aa-12aa-4045-01e4-c80b013dbb13 timed out (after 5000 ms), retrying
[webapp] WARN 2012-12-14 15:36:55,785 [UDP] : tc-fr-sta-tomcat1-32427: no physical address for b0dc40aa-12aa-4045-01e4-c80b013dbb13, dropping message
It seems the node try to join himself ???! We have to restart all tomcat in production to restore the cluster. Anybody can help me to resolve this issue ?