0

Gemfire cluster suddenly goes down because of ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator

We have a 2 locator and 2 server Gemfire cluster. We bootstrap Gemfire cache server using cache.xml and spring data gemfire xml using spring boot initializer.

We have a client spring boot service which connect to cluster.

Gemfire cluster suddenly goes down randomly due to ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator. What could be the reason for it?. After restart it works fine for a day or 2 without issues and then this issue comes. It impacts our High availability. Please help us fixing this.

org.apache.geode.GemFireConfigException: cluster configuration service not available
        at org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:1025)
        at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1149)
        at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:758)
        at org.apache.geode.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:735)
        at org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2748)
        at org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2518)
        at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:993)
        at org.apache.geode.distributed.internal.DistributionManager$MyListener.membershipFailure(DistributionManager.java:4354)
        at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1556)
        at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.lambda$forceDisconnect$0(GMSMembershipManager.java:2593)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.internal.config.ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator.
        at org.apache.geode.internal.cache.ClusterConfigurationLoader.requestConfigurationFromLocators(ClusterConfigurationLoader.java:259)
        at org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:988)
        ... 10 more

Expected behavior is high availability of Gemfire cluster

Sudharsan
  • 207
  • 1
  • 2
  • 12

1 Answers1

2

By default, whenever a GemFire server starts up (or automatically reconnects to the cluster after an unexpected shutdown), it tries to recover the Cluster Configuration from any locator, if it fails to do so then the member will just shutdown itself, which is what's happening looking at the stack trace attached (see the occurrence of org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect in the stack). I'd focus my analysis in why the member was disconnected in the first place, the subsequent failure to reconnect is just a consequence and not the root cause of the issue.

Either way, if you're just using individual xml files to configure your members and don't want to use the Cluster Configuration Service at all, then you can just start your locator with the property --enable-cluster-configuration=false (the default is true) and your servers with --use-cluster-configuration=false (the default is also true), this will prevent the servers from trying to start up using the cluster configuration from the locators.

Hope this helps. Cheers.

Juan Ramos
  • 1,421
  • 1
  • 8
  • 13
  • Another thing to keep in mind is, that by default, SDG disables the "use" of _Cluster Configuration_. SDG has a strong notion that configuration should only originate from 1 place, and when using Spring, the config should primarily be expressed with Spring config. This largely has to do with the fact that Spring is not only configuring the GemFire server components (e.g. Regions, and so forth) but also likely configuration server-side application components (callbacks, Functions, etc). Anyway, you can enable Cluster Configuration in XML by setting the appropriate `cache` element attribute.. – John Blum Jul 31 '19 at 14:58
  • In XML, that is ``. However as *Juan* states. There is possibly other reason why your server is failing to start that should be investigated first. – John Blum Jul 31 '19 at 14:59