0

I'm trying to change the state of a 3 node Gridgain cluster running on Kubernetes using the control.sh script as documented.

./control.sh --set-state INACTIVE

this usually should return success in a short time, but now it takes forever and the only way to break away is to CTRL+C. But after that cluster moves into an unexpected state where,

./control.sh --set-state ACTIVE

would fail. Below is the exception extracted from the gridgain log.

[1]

[SEVERE][rest-#70%dev%][GridJobWorker] Failed to execute job [jobId=a1fbeb63c71-ab1511ea-8668-442e-8aea-0d51e23026d6, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=o.a.i.i.v.misc.VisorChangeGridActiveStateTask, dep=LocalDeployment [super=GridDeployment [ts=1633005984012, depMode=SHARED, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2c13da15, clsLdrId=c09ddb63c71-ab1511ea-8668-442e-8aea-0d51e23026d6, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=o.a.i.i.v.misc.VisorChangeGridActiveStateTask, sesId=91fbeb63c71-ab1511ea-8668-442e-8aea-0d51e23026d6, startTime=1633008816019, endTime=9223372036854775807, taskNodeId=ab1511ea-8668-442e-8aea-0d51e23026d6, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@2c13da15, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=true, topPred=ContainsNodeIdsPredicate [], subjId=ab1511ea-8668-442e-8aea-0d51e23026d6, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=939562377]], execName=null], jobId=a1fbeb63c71-ab1511ea-8668-442e-8aea-0d51e23026d6]]

class org.apache.ignite.IgniteException: Failed to activate cluster, because another state change operation is currently in progress: deactivate cluster

Following attempts at ./control.sh would immediately throw the below exception.

[2]

Command [SET-STATE] finished with code: 4
Error stack trace:
class org.apache.ignite.internal.client.GridClientException: null
suppressed:

        at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.handleClientResponse(GridClientNioTcpConnection.java:628)
        at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.handleResponse(GridClientNioTcpConnection.java:559)
        at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onMessage(GridClientConnectionManagerAdapter.java:694)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:278)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:108)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:115)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:108)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3714)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:174)
        at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1193)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2504)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2269)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1891)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
        at java.base/java.lang.Thread.run(Thread.java:829)

On the server logs following was then observed.

[3]

class org.apache.ignite.IgniteCheckedException: Failed to send response to node. Unsupported direct type [message=GridDhtAffinityAssignmentRequest [flags=1, futId=23, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], super=GridCacheGroupIdMessage [grpId=-149688677]]]
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processFailedMessage(GridCacheIoManager.java:1139)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:382)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1726)
        at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1333)
        at org.apache.ignite.internal.managers.communication.GridIoManager.access$4800(GridIoManager.java:157)
        at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1218)
        at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to find message handler for message: GridDhtAffinityAssignmentRequest [flags=1, futId=23, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], super=GridCacheGroupIdMessage [grpId=-149688677]]
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
        ... 11 more

Thanks in advance for your help to resolve this issue. Gridgain version: 8.8.8

Nowa Concordia
  • 709
  • 6
  • 23
  • Hello, could you please provide the full cluster log for this case and the configurations as well? There is no way to get the "Failed to find message handler for message: GridDhtAffinityAssignmentRequest" because this handler is always registered unless the node that receives the message is the client one or the the cache is started in LOCAL mode. It would be very helpful to reproduce this exception. – antkr Oct 08 '21 at 15:21
  • @antkr I'll try to collect them and share them with you. For the time being, we were able to bring them up by deleting the working directory since it's in a dev environment, which would never be an option in production. To give a heads up, all of this starts when we shut down for holidays and bring it up later, which gives the suspicion that it might have not been shutting down gracefully, even though we have set a time buffer in Kubernetes config. – Nowa Concordia Oct 25 '21 at 05:36

1 Answers1

0

Deactivation might take a while because of different reasons and it's better to check the logs.

As for the commands, you can use the force mode:

control.(sh|bat) --set-state INACTIVE|ACTIVE|ACTIVE_READ_ONLY [--force] [--yes]

./control.sh --set-state INACTIVE --force
Alexandr Shapkin
  • 2,350
  • 1
  • 6
  • 10
  • Hi, I tried with --force mode too, but still it also gave the log [2] at the script. Additionally, I observed log [3] in the server log. – Nowa Concordia Sep 30 '21 at 15:11