0

I have a bulk Python script uploading data to a Cassandra database. In a nutshell, it loops through a list of data rows and uploads each one with its timestamp to Cassandra.

What I notice is that every so often the script will crash with an error similar to the following:

Failed to write to cassandra table: ('Unable to complete the operation against any hosts',{})

At that point, nodetool statusbinary returns not running and all I need to do is run nodetool enablebinary to get the script working again - until the next crash, anyway.

I don't see anything turning the native transport off in the logs - what may be the cause of the binary turning itself off 'on its own'?

Version numbers: Cassandra: 3.0.9 Java: 1.8.0_162 CentOS: 6.9

Edit: reading the log, I noticed the following:

INFO [RMI TCP Connection(5893)-127.0.0.1] 2018-05-08 10:58:17,318 Server.java:160 - Starting listening for CQL clients on /XX.XXX.X.XX:9042 (unencrypted)... ERROR [CompactionExecutor:29] 2018-05-08 10:58:48,700 CassandraDaemon.java:205 - Exception in thread Thread[CompactionExecutor:29,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Channel not open for writing - cannot extend file to required size

Longer bit:

WARN [CompactionExecutor:4] 2018-05-09 12:10:11,094 CLibrary.java:304 - fsync(315) failed, errno (22) {} com.sun.jna.LastErrorException: [22] h at org.apache.cassandra.utils.CLibrary.fsync(Native Method) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.CLibrary.trySync(CLibrary.java:293) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.SyncUtil.trySync(SyncUtil.java:179) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.SyncUtil.trySyncDir(SyncUtil.java:190) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:117) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:135) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:69) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:163) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.<init>(BigTableWriter.java:73) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigFormat$WriterFactory.open(BigFormat.java:93) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:96) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.switchCompactionLocation(DefaultCompactionWriter.java:64) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.maybeSwitchWriter(CompactionAwareWriter.java:128) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:108) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:183) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) [apache-cassandra-3.0.9.jar:3.0.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_162] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_162] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_162] WARN [CompactionExecutor:4] 2018-05-09 12:10:11,096 CLibrary.java:304 - fsync(316) failed, errno (22) {} com.sun.jna.LastErrorException: [22] i at org.apache.cassandra.utils.CLibrary.fsync(Native Method) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.CLibrary.trySync(CLibrary.java:293) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.SyncUtil.trySync(SyncUtil.java:179) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.SyncUtil.trySyncDir(SyncUtil.java:190) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:117) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:135) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:150) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.<init>(BigTableWriter.java:375) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.<init>(BigTableWriter.java:84) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigFormat$WriterFactory.open(BigFormat.java:93) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:96) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.switchCompactionLocation(DefaultCompactionWriter.java:64) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.maybeSwitchWriter(CompactionAwareWriter.java:128) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:108) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:183) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) [apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) [apache-cassandra-3.0.9.jar:3.0.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_162] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_162] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_162] ERROR [Reference-Reaper:1] 2018-05-09 12:10:14,363 Ref.java:223 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4a962079) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@2019607399:[Memory@[0..4e0), Memory@[0..30c0)] was not released before the reference was garbage collected ERROR [CompactionExecutor:4] 2018-05-09 12:14:00,394 CassandraDaemon.java:205 - Exception in thread Thread[CompactionExecutor:4,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Channel not open for writing - cannot extend file to required size at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:156) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:280) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:216) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:142) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedRegions.extend(MmappedRegions.java:131) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.updateRegions(MmappedSegmentedFile.java:130) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:111) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:177) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.util.SegmentedFile$Builder.buildIndex(SegmentedFile.java:198) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:244) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:172) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:124) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:183) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[apache-cassandra-3.0.9.jar:3.0.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) ~[apache-cassandra-3.0.9.jar:3.0.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_162] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_162] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_162] Caused by: java.io.IOException: Channel not open for writing - cannot extend file to required size at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:901) ~[na:1.8.0_162] at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:152) ~[apache-cassandra-3.0.9.jar:3.0.9] ... 23 common frames omitted ERROR [CompactionExecutor:4] 2018-05-09 12:14:00,394 StorageService.java:399 - Stopping native transport INFO [CompactionExecutor:4] 2018-05-09 12:14:00,394 Server.java:180 - Stop listening for CQL clients

dmesg outputs a lot of:

CIFS VFS: Send error in SETFSUnixInfo = -11 CIFS VFS: Send error in Close = -11 CIFS VFS: Unexpected lookup error -512

Apparently the leak is a known issue of Cassandra 3.0.9 which ships with Netty 4.0.23 - but even if it was a memory leak issue, I've restarted Cassandra and it still doesn't work...

user988066
  • 83
  • 1
  • 1
  • 5
  • it looks like that your node could be crashing. What do you see in the `system.log`? – Alex Ott Apr 09 '18 at 16:13
  • @AlexOtt edited the post with that information. – user988066 May 08 '18 at 10:08
  • SSTables are immutable, and compaction is triggered only when they are already written. I would suggest to check permissions for Cassandra data. – Alex Ott May 08 '18 at 13:20
  • @AlexOtt Cassandra is ran as sudo and the data directories have complete read/write permissions for all users... – user988066 May 09 '18 at 08:29
  • (not related to actual problem) that's not very good setup - it's better to run Cassandra as dedicated user – Alex Ott May 09 '18 at 09:56
  • Can you post more lines from error log - after the exception. This problem looks like this: https://groups.google.com/forum/#!topic/nosql-databases/SThJas_Yvbg, so you may have problem with memory mapping - maybe you don't have enough memory for it – Alex Ott May 09 '18 at 09:59
  • @AlexOtt I've tried restarting Cassandra to clear memory, to no avail. Oddly it's been working fine for a while, but now it literally crashes with the above error all the time - a possible culprit is that I can usually find a 40+GB compaction job being run near the error happening... There are many imperfections about the current Cassandra setup, many of which are beyond my control, unfortunately - they are also the causes behind the multiple fsync errors you see in the logs. – user988066 May 09 '18 at 11:26
  • Yes, if you see it regularly, then it's possible culprit. Thanks for extended logs. I see that `fsync` is failing - this could be a problem with disk system for example. Do you see any messages in the dmesg? – Alex Ott May 09 '18 at 11:32
  • A lot, actually, mainly: CIFS VFS: Send error in SETFSUnixInfo = -11 CIFS VFS: Send error in Close = -11 CIFS VFS: Unexpected lookup error -512 The error codes seem to be -5, -11, -512. I'll add that to main body of the post. – user988066 May 09 '18 at 12:39
  • Hmmm, do you have Cassandra data on Samba/CIFS? If yes, then move to local disk... – Alex Ott May 09 '18 at 12:40

0 Answers0