Ignite version v2.8.1-1
I have configured RestartProcessFailureHandler for handling the system critical errors like SYSTEM_WORKER_BLOCKED, however, when the error occurs, the restart never happens even after hours, is this expected behavior?
However, do see in the logs that indicating a restart has been requested but it seems never got executed.
As an alternative, I am thinking of enabling the rest API for a liveness check of the service and restarting the service once the check fails if the failure handler is not suitable for handling this case, please advise.
Thanks.
[2022-03-08T02:14:32,561][ERROR][disco-event-worker-#44%ignite-instance%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=RestartProcessFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=Unmod ifiableSet []]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]] org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2796) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.8.1.jar:2.8.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312] ...
[2022-03-08T02:14:32,603][ERROR][node-restarter][] Restarting JVM on Ignite failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]] ....