Embedded hazelcast cluster occasionally breaks for no apparent reason

Question

The hazelcast cluster runs in an application running on Kubernetes. I can't see any traces of partitioning or other problems in the logs. At some point, this exception starts to appear in the logs:

hz.dazzling_morse.partition-operation.thread-1 com.hazelcast.logging.StandardLoggerFactory$StandardLogger: app-name, , , , ,  - [172.30.67.142]:5701 [app-name] [4.1.5] Executor is shut down.
java.util.concurrent.RejectedExecutionException: Executor is shut down.
        at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.checkNotShutdown(AbstractSchedulerOperation.java:73)
        at com.hazelcast.scheduledexecutor.impl.operations.AbstractSchedulerOperation.getContainer(AbstractSchedulerOperation.java:65)
        at com.hazelcast.scheduledexecutor.impl.operations.SyncBackupStateOperation.run(SyncBackupStateOperation.java:39)
        at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
        at com.hazelcast.spi.impl.operationexecutor.OperationRunner.runDirect(OperationRunner.java:150)
        at com.hazelcast.spi.impl.operationservice.impl.operations.Backup.run(Backup.java:174)
        at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184)
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:256)
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:237)
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:452)
        at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:166)
        at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:136)
        at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
        at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)

I can't see any particular operation failing, prior to that. I do run some scheduled operations myself, but they are executing inside try-catch blocks and are never throwing.

The consequence is that whenever a node in the cluster restarts no data is replicated to the new node, which eventually renders the entire cluster useless - all data that's supposed to be cached and replicated among nodes disappears.

What could be the cause? How can I get more details about what causes whatever executor hazelcast uses to shut down?

Are you able to post more logs, with default logging levels for Hazelcast ? I would imagine the executor is more of a consequence of an earlier problem, that might be logged. Out of memory possibly. — Neil Stevenson, Apr 06 '22 at 14:03
I would have posted more logs, if I found anything relevant in them, but I didn't. I can't easily post relevant logs - the error appears out of the blue after the application runs properly for several days, and by that time there's already several megs of logs on each node of the cluster, even with INFO logging level. — user625488, Apr 06 '22 at 14:38
It may also happen due to network disruptions and that can only be confirmed with adequate logs. You don't need to post the whole set, just start with few 100 lines before this error message. — wildnez, Apr 07 '22 at 05:19

score 0 · Answer 1 · answered Apr 07 '22 at 07:36

0

Based on other conversations...

Your Runnable / Callable should implement HazelcastInstanceAware.

Don't pass the HazelcastInstance or IExecutorService as a non-transient argument... as the instance where the runnable is submitted will be different from the one where it runs.

See this.

answered Apr 07 '22 at 07:36

Neil Stevenson

3,060
9
11

I don't use the hazelcast instance in the runnable bean directly. Instead, the hazelcast instance is used by one of the Runnable's dependencies. How would injecting the hazelcast instance into a class that doesn't use it help? What I need is for hazelcast to inject the autowired dependencies into objects it deserializes, using the application context. – user625488 Apr 07 '22 at 08:07

Embedded hazelcast cluster occasionally breaks for no apparent reason

1 Answers1