0

I got out of memory exception and ignite got crashed. After going through the ignite logs, in last metrics I could see heap, off-heap memory usage was about 171 MB,70MB respectively and after 10 secs, ignite logs shows out of memory exception. also, other flags in metrics looks ok

Below is the log snippet

[01:04:29,690][INFO][grid-timeout-worker-#22][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=8a034404, uptime=39 days, 15:50:23.086]
    ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=1, topVer=22, minorTopVer=0]
    ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.28.230.222], discoPort=47500, commPort=47100]
    ^-- CPU [CPUs=4, curLoad=0.07%, avgLoad=0.15%, GC=0%]
    ^-- Heap [used=171MB, free=95.15%, comm=254MB]
    ^-- Off-heap memory [used=70MB, free=98.02%, allocated=3377MB]
    ^-- Page memory [pages=17878]
    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]
    ^--   default region [type=default, persistence=true, lazyAlloc=true,
      ...  initCfg=256MB, maxCfg=3177MB, usedRam=70MB, freeRam=97.78%, allocRam=3177MB, allocTotal=69MB]
    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.95%, allocRam=0MB, allocTotal=0MB]
    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]
    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
    ^-- Ignite persistence [used=69MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=7, qSize=0]
    ^-- Striped thread pool [active=0, idle=8, qSize=0]
[01:04:38,584][INFO][db-checkpoint-thread-#104][Checkpointer] Checkpoint started [checkpointId=41e99f38-7359-4af1-945f-61c92d2a5fb7, startPtr=WALPointer [idx=147, fileOff=11684440, len=381549], checkpointBeforeLockTime=9ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=17ms, checkpointLockHoldTime=19ms, walCpRecordFsyncDuration=2ms, writeCheckpointEntryDuration=2ms, splitAndSortCpPagesDuration=0ms, pages=9, reason='timeout']
[01:04:38,619][SEVERE][db-checkpoint-thread-#104][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Compound exception for CountDownFuture.]]
class org.apache.ignite.IgniteCheckedException: Compound exception for CountDownFuture.
    at org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
    at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
    at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:478)
    at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
    Suppressed: java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at sun.nio.ch.SimpleAsynchronousFileChannelImpl.implWrite(Unknown Source)
        at sun.nio.ch.AsynchronousFileChannelImpl.write(Unknown Source)
        at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.write(AsyncFileIO.java:177)
        at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO$5.run(AbstractFileIO.java:117)
        at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.fully(AbstractFileIO.java:53)
        at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:115)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.write(FilePageStore.java:748)
        at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageReadWriteManagerImpl.write(PageReadWriteManagerImpl.java:116)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.write(FilePageStoreManager.java:636)
        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.lambda$new$0(CheckpointManager.java:175)
        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter$1.writePage(CheckpointPagesWriter.java:266)
        at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1343)
        at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1250)
        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:207)
        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:151)
        ... 3 more
[01:04:38,620][SEVERE][db-checkpoint-thread-#104][FailureProcessor] No deadlocked threads detected.
[01:04:38,749][SEVERE][db-checkpoint-thread-#104][FailureProcessor] Thread dump at 2022/02/06 01:04:38 CST
ba6971
  • 85
  • 7

1 Answers1

1

unable to create new native thread

This seems to be a non-Ignite exception and most likely is about your system configuration.

Check your Process File Descriptor Limit by running the ulimit -a command and increase it if required. The recommended value is 32768 or above. If it requires an adjustment that can be accomplished by either running ulimit -n 32768 -u 32768 or by modifying the /etc/security/limits.conf

Alexandr Shapkin
  • 2,350
  • 1
  • 6
  • 10