I have 2 ignite servers with following discovery spi configured
TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Arrays.asList("127.0.0.1:47500..47509"));
discoSpi.setIpFinder(ipFinder);
Both of them are listening to remote event: EVT_NODE_LEFT, when I close one of server, EVT_NODE_LEFT is triggered and I will do some clean job from ignite on that node
IgniteEvents events = ignite.events();
IgnitePredicate<DiscoveryEvent> filter = evt -> {
if (evt.eventNode().isClient()) {
return true;
}
System.out.println("remote event: " + evt.name());
System.out.println("remote event: " + evt.eventNode().consistentId());
return true;
};
UUID uuid = events.remoteListen(new IgniteBiPredicate<UUID, DiscoveryEvent>() {
@Override
public boolean apply(UUID uuid, DiscoveryEvent e) {
ClusterNode node = e.eventNode();
if(node.isClient()) {
return true;
}
String consistentId= node.consistentId().toString();
IgniteCache<String, String> cache = ignite.getOrCreateCache("test");
//some operation on cache...
return true; //continue listening
}
}, filter, EventType.EVT_NODE_LEFT);
When I have data under key "test", program runs perfectly; If I have no data under key "test", server is hanging there with following message:
[2021-08-18 15:16:03,828][ERROR][tcp-disco-msg-worker-[crd]-#2-#42][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=disco-event-worker, threadName=disco-event-worker-#49, blockedFor=18s]
[15:16:03] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=disco-event-worker, igniteInstanceName=null, finished=false, heartbeatTs=1629270945183]]]
After debug, I found when calling ignite.getOrCreateCache("test"), thread is parked in method GridFutureAdapter.get0(boolean ignoreInterrupts) at line: LockSupport.park(); Seems like it's waiting for respond from offline server. I don't know why this happened only when there is no data, but there is no data is a common case in my scenario. How can I fix this issue? Thanks
Ignite version: 2.9.1