I got out of memory exception and ignite got crashed. After going through the ignite logs, in last metrics I could see heap, off-heap memory usage was about 171 MB,70MB respectively and after 10 secs, ignite logs shows out of memory exception. also, other flags in metrics looks ok
Below is the log snippet
[01:04:29,690][INFO][grid-timeout-worker-#22][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=8a034404, uptime=39 days, 15:50:23.086]
^-- Cluster [hosts=1, CPUs=4, servers=1, clients=1, topVer=22, minorTopVer=0]
^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.28.230.222], discoPort=47500, commPort=47100]
^-- CPU [CPUs=4, curLoad=0.07%, avgLoad=0.15%, GC=0%]
^-- Heap [used=171MB, free=95.15%, comm=254MB]
^-- Off-heap memory [used=70MB, free=98.02%, allocated=3377MB]
^-- Page memory [pages=17878]
^-- sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,
... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]
^-- default region [type=default, persistence=true, lazyAlloc=true,
... initCfg=256MB, maxCfg=3177MB, usedRam=70MB, freeRam=97.78%, allocRam=3177MB, allocTotal=69MB]
^-- metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,
... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.95%, allocRam=0MB, allocTotal=0MB]
^-- TxLog region [type=internal, persistence=true, lazyAlloc=false,
... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]
^-- volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
^-- Ignite persistence [used=69MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=7, qSize=0]
^-- Striped thread pool [active=0, idle=8, qSize=0]
[01:04:38,584][INFO][db-checkpoint-thread-#104][Checkpointer] Checkpoint started [checkpointId=41e99f38-7359-4af1-945f-61c92d2a5fb7, startPtr=WALPointer [idx=147, fileOff=11684440, len=381549], checkpointBeforeLockTime=9ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=17ms, checkpointLockHoldTime=19ms, walCpRecordFsyncDuration=2ms, writeCheckpointEntryDuration=2ms, splitAndSortCpPagesDuration=0ms, pages=9, reason='timeout']
[01:04:38,619][SEVERE][db-checkpoint-thread-#104][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Compound exception for CountDownFuture.]]
class org.apache.ignite.IgniteCheckedException: Compound exception for CountDownFuture.
at org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:478)
at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Suppressed: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at sun.nio.ch.SimpleAsynchronousFileChannelImpl.implWrite(Unknown Source)
at sun.nio.ch.AsynchronousFileChannelImpl.write(Unknown Source)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.write(AsyncFileIO.java:177)
at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO$5.run(AbstractFileIO.java:117)
at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.fully(AbstractFileIO.java:53)
at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:115)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.write(FilePageStore.java:748)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageReadWriteManagerImpl.write(PageReadWriteManagerImpl.java:116)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.write(FilePageStoreManager.java:636)
at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.lambda$new$0(CheckpointManager.java:175)
at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter$1.writePage(CheckpointPagesWriter.java:266)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1343)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1250)
at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:207)
at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:151)
... 3 more
[01:04:38,620][SEVERE][db-checkpoint-thread-#104][FailureProcessor] No deadlocked threads detected.
[01:04:38,749][SEVERE][db-checkpoint-thread-#104][FailureProcessor] Thread dump at 2022/02/06 01:04:38 CST