0

We having scaling Apache Ignite grid where client nodes scale up and down based on load. Data nodes are our server nodes where continuous queries run. However this leads to unclean shutdown of some client nodes as we rely on SIGTERM for Ignite node shutdown.

Unclean shutdown of client nodes impacts excution of continuos query which starts giving "Possible starvation in striped pool" warning ultimately leading to Blocked system-critical threads.

We are currently working on ways to prevent striped pool stravation and have noticed 2 key issues around it:

Continuos query thread trying to connect to nodes which have shutdown but are still present in topology: We are planning to reduce the timeout so that client node is discarded early from grid.

Stacktrace:

Thread [name="sys-stripe-1-#2%App%", id=37, state=RUNNABLE, blockCnt=233817, waitCnt=3343945]
        at sun.nio.ch.Net.poll(Native Method)
        at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3781)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3635)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3375)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3180)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960)
        at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:2100)
        at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:2365)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1964)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1935)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1917)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:1324)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1261)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:1059)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:90)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:459)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447)

Continuos query threads waiting for read lock while trying to update the cache. This is generally comes up after retries for Client node connection are over.

Stacktrace:

Possible starvation in striped pool.
    Thread name: sys-stripe-12-#13%App%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=CacheContinuousQueryBatchAck [routineId=37b43550-d3a5-4518-8745-ece5dc06b1fd, updateCntrs=HashMap {2=7414, 5=8228, 7=7508, 13=7536, 525=7586, 14=7596, 527=7959, 533=7886, 534=7666, 539=9556, 547=7866, 36=8380, 549=8131, 38=7126, 39=7776, 46=7822, 52=7800, 54=8098, 567=7894, 569=7640, 60=7912, 62=8170, 63=7962, 64=8190, 65=7662, 72=7754, 585=7712, 81=8564, 594=8000, 82=7980, 83=7999, 595=7688, 596=7972, 85=7494, 597=7806, 601=7812, 89=7478, 602=7868, 603=7944, 604=7944, 93=7778, 96=8036, 99=7916, 102=7584, 618=7956, 107=7656, 111=7176, 112=8042, 116=7620, 125=7768, 637=7662, 130=7846, 642=7696, 134=11672, 138=7638, 651=7418, 652=7908, 140=7478, 654=9136, 655=8934, 144=8052, 145=7656, 147=7904, 663=7354, 153=7868, 667=8232, 669=7774, 157=7850, 160=8094, 673=8120, 682=7722, 172=7930, 689=7864, 180=8026, 692=7674, 184=7526, 699=7458, 191=8326, 193=7700, 195=7986, 197=8056, 713=7858, 716=7896, 719=7946, 210=7560, 725=7604, 214=7442, 727=7668, 729=7406, 731=7790, 219=7594, 733=7360, 225=7522, 737=7482, 227=7838, 744=8380, 234=7150, 237=7886, 750=7910, 239=8624... and 104 more}]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=CacheContinuousQueryBatchAck [routineId=0e950ae5-1474-4488-9042-80dbddb2f09a, updateCntrs=HashMap {2=7414, 5=8228, 7=7508, 13=7536, 525=7586, 14=7596, 527=7959, 533=7886, 534=7666, 539=9556, 547=7866, 36=8380, 549=8131, 38=7126, 39=7776, 46=7822, 52=7800, 54=8098, 567=7894, 569=7640, 60=7912, 62=8170, 63=7962, 64=8190, 65=7662, 72=7754, 585=7712, 81=8564, 594=8000, 82=7980, 595=7688, 83=7999, 596=7972, 597=7806, 85=7494, 601=7812, 89=7478, 602=7868, 603=7944, 604=7944, 93=7778, 96=8036, 99=7916, 102=7584, 618=7956, 107=7656, 111=7176, 112=8042, 116=7620, 637=7662, 125=7768, 130=7846, 642=7696, 134=11672, 138=7638, 651=7418, 140=7478, 652=7908, 654=9136, 655=8934, 144=8052, 145=7656, 147=7904, 663=7354, 153=7868, 667=8232, 669=7774, 157=7850, 160=8094, 673=8120, 682=7722, 172=7930, 689=7864, 692=7674, 180=8026, 184=7526, 699=7458, 191=8326, 193=7700, 195=7986, 197=8056, 713=7858, 716=7896, 719=7946, 210=7560, 725=7604, 214=7442, 727=7668, 729=7406, 219=7594, 731=7790, 733=7360, 225=7522, 737=7482, 227=7838, 744=8380, 234=7150, 237=7886, 750=7910, 239=8624... and 104 more}]]]]
    Deadlock: false
    Completed: 3316358
Thread [name="sys-stripe-12-#13%App%", id=48, state=WAITING, blockCnt=106311, waitCnt=1659827]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@5f611d9a, ownerName=exchange-worker-#71%App%, ownerId=138]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.i.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.readLock(GridDhtPartitionTopologyImpl.java:256)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1837)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1734)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3322)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:141)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
        at o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:241)
        at o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
        at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
        at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:565)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

    

Here we can see that lock is owned by "exchange-worker-#71%App%" which seems to be struck. In few cases we have seen that lock has no owner specific:

Thread [name="sys-stripe-2-#3%App%", id=43, state=WAITING, blockCnt=39097, waitCnt=394328]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@667500d1, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1663)

    

Continuos query runs on server nodes which are our data nodes and we do not expect data nodes to be impacted by client nodes like getting locked. Can someone advice on how we can avoid such locks given that nodes can have unclean shutdowns?

Lokesh
  • 7,810
  • 6
  • 48
  • 78

1 Answers1

1

I believe setting IGNITE_ENABLE_FORCIBLE_NODE_KILL property as true cluster-wide could help with the matter. It streamlines the process of kicking thick client nodes off a cluster, the main case for it is abruptly terminated client nodes.

Vladimir Pligin
  • 1,547
  • 11
  • 18
  • Thanks @Vladimir Pligin , I have also noticed that our client nodes are triggering Partition exchange which i am a bit surprised as my understanding was partition exchange cannot be trigerred by Client nodes. Can it be a config issue ? Here is an example of event trigerring partition exchange on client node : Started exchange init [topVer=AffinityTopologyVersion [topVer=2733, minorTopVer=0], crd=false, evt=NODE_LEFT, evtNode=cc150ead-d458-4093-b0f9-ce6d02110e6e, customEvt=null, allowMerge=true, exchangeFreeSwitch=false] – Lokesh Dec 14 '22 at 14:25
  • There are thick and thin clients. You are using thick clients and they are triggering PME (it's a little bit more lightweight though) as they are fully cluster-aware. Maybe you should have a look at thin clients https://ignite.apache.org/docs/latest/thin-clients/java-thin-client – Vladimir Pligin Dec 14 '22 at 15:16
  • I read on wiki page for PME that if Ignite version is above 2.8 then thick client "Node Join" event will not trigger PME. If this is the case then i believe "Node Left" event for thick client will also not trigger PME as thick client was never part of topology. Is this correct understanding. – Lokesh Feb 27 '23 at 15:46
  • Also if thick client in 2.8 and above is not aware of Partition map then does it mean it will fetch data from Data nodes via router node and will it cause any performance issues? – Lokesh Feb 27 '23 at 16:09