0

We have a 3 node Kafka cluster (version 5.2.1, apache kafka version: 2.2.0) in our environment. For sometime we have been observing an exception which happens intermittently whenever we try to push data from a test producer. Following is the exception:

[Log partition=debug-topic-1, dir=/tmp/kafka-logs] Found deletable segments with base offsets [4] due to retention time 604800000ms breach (kafka.log.Log:66)
[2020-04-20 22:42:39,303] INFO [ProducerStateManager partition=debug-topic-1] Writing producer snapshot at offset 5 (kafka.log.ProducerStateManager:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 5 in 1 ms. (kafka.log.Log:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Scheduling log segment [baseOffset 4, size 84] for deletion. (kafka.log.Log:66)
[2020-04-20 22:42:39,310] ERROR Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2020-04-20 22:42:39,311] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more

We have other producers which push data continuously to different topics in the cluster, but the aforementioned issue never happens. I have tried to delete and recreate this topic debug-topic-1 several times to ensure that no corrupt or faulty state in present in zookeeper as well as in the kakfa logs. But still this problem occurs after sometime eventually. If anyone has encountered similar problem and was able to get through it kindly let me know.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user1184527
  • 113
  • 1
  • 12
  • Can you check when the exception happens, if the file actually exists? – thuyein Apr 21 '20 at 01:30
  • When I checked last time the file did not exist. But I checked after 2-3 minutes. It is difficult for me to reproduce the scenario since it happens randomly. However I will share the details again whenever the exception happens next time. From the logs it feels like since the file '00000000000000000004.log' was already up for retention, the producer started writing to another file 00000000000000000005.log, but when it tried to clear the 04.log extension file it couldn't find it because it might have been already removed. Just a guess! – user1184527 Apr 21 '20 at 01:39

1 Answers1

0

Seems like your machine might have rebooted or that /tmp was cleared in some other way.

You must change Kafka log.dirs (and Zookeeper dataDir) to not use /tmp

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245