I have a strange problem where Spark job kills Cassandra server. For this question I simulated situation where Cassandra 3.9 server dies without Spark. Please check is it OK or it is a bug.
There is a following Cassandra table:
CREATE TABLE test.data (
pk int,
ck text,
data text,
PRIMARY KEY (pk, ck)
)
The table is populated with 10M random rows. Where data column is 2K character random text. In case where data column has shorter length - everything is fine.
Here is a bash script populating the table:
(for ((i=0; i<10000000; i=i+1)); do echo `cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1`,`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 1`,`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 2048 | head -n 1`; done) | cqlsh -e "copy test.data from stdin"
If I run 30 concurrent queries loading ~4M rows - Cassandra server dies on my Linux machine (16Gb RAM, I7 quad processor). Here is a bash script starting this queries:
for ((i=1; i<30; i++)); do (cqlsh -e "select * from test.data where token(pk)>=-7417407713535771950 and token(pk)<= 0" --request-timeout=600 | wc -l)& done;
Single query fetches 4031926 rows.
Log file could be downloaded from here. Operating system is Ubuntu 16.10. JVM - openjdk version "1.8.0_111". Cassandra version 3.9.