I have a 5 node Service Grid running on an Ignite 2.10.0 Cluster. Testing the upgrade, I stop one Server Node (SIGTERM) and wait for it it rejoin. It fails to stay connected to the cluster?
Each node is a primary micro service provider and a back for another (Cluster Singletons). The service that was running on the node that left the cluster is properly picked up by it's backup node. However, the server node can not stay connected to the cluster ever again!
Rejoin strategy:
- Let systemd restart ignite.
- The node rejoins, but then the new Server Node invokes it's shutdown-hook
- Go back to 1
I have no idea why the rejoined node shuts itself down. As far as I can tell, the Coordinator did not kill this youngest Server Node. I am logging with DEBUG and IGNITE_QUEIT set to false; I still can't find anyting in the logs.
I tried increasing network timeouts, but the newly re-joined node still shuts down???
Any idea what is going on or where to look?
Thanks in advance.
Greg
Environment:
RHEL 7.9, Java 11
Ignite configuration:
- persistence is set to false.
- clientReconnectDisabled is set to true