0

I have 2 ignite servers with following discovery spi configured

TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Arrays.asList("127.0.0.1:47500..47509"));
discoSpi.setIpFinder(ipFinder);

Both of them are listening to remote event: EVT_NODE_LEFT, when I close one of server, EVT_NODE_LEFT is triggered and I will do some clean job from ignite on that node

IgniteEvents events = ignite.events();

IgnitePredicate<DiscoveryEvent> filter = evt -> {
    if (evt.eventNode().isClient()) {
        return true;
    }
    System.out.println("remote event: " + evt.name());
    System.out.println("remote event: " + evt.eventNode().consistentId());
    return true;
};
UUID uuid = events.remoteListen(new IgniteBiPredicate<UUID, DiscoveryEvent>() {

    @Override
    public boolean apply(UUID uuid, DiscoveryEvent e) {
        ClusterNode node = e.eventNode();
        if(node.isClient()) {
            return true;
        }
        String consistentId= node.consistentId().toString();
        IgniteCache<String, String> cache = ignite.getOrCreateCache("test");
        //some operation on cache...
        return true; //continue listening
    }
}, filter, EventType.EVT_NODE_LEFT);

When I have data under key "test", program runs perfectly; If I have no data under key "test", server is hanging there with following message:

[2021-08-18 15:16:03,828][ERROR][tcp-disco-msg-worker-[crd]-#2-#42][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=disco-event-worker, threadName=disco-event-worker-#49, blockedFor=18s]
[15:16:03] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=disco-event-worker, igniteInstanceName=null, finished=false, heartbeatTs=1629270945183]]]

After debug, I found when calling ignite.getOrCreateCache("test"), thread is parked in method GridFutureAdapter.get0(boolean ignoreInterrupts) at line: LockSupport.park(); Seems like it's waiting for respond from offline server. I don't know why this happened only when there is no data, but there is no data is a common case in my scenario. How can I fix this issue? Thanks

Ignite version: 2.9.1

Henrik
  • 123
  • 1
  • 2
  • 10

1 Answers1

0

This is expected behavior. You should not perform any blocking operations inside an event listener, otherwise, you might get your thread to become blocked.

The thing is - the callback is being invoked on Ignite's internal thread and should be finished as quick as possible. Just forward the callback logic to a custom thread pool and you would be fine.

Alternatively, you might switch to Continuous Query that works in an async manner and doesn't have blocking mechanics.

Alexandr Shapkin
  • 2,350
  • 1
  • 6
  • 10
  • Thanks for reply, one thing I don't understand is that why it is not blocked when there is data in the cache, how blocking operation is defined in this case? – Henrik Aug 23 '21 at 06:34
  • I suppose that's because cache proxy retrieval operation is quite lightweight whereas new cache initialization is a global procedure that requires some cluster-wide locking and synchronization mechanisms. – Alexandr Shapkin Aug 23 '21 at 13:22