I'm running neo4j 2.3.0-RC1
embedded, using the Java API. It keeps crashing without warning, and I'm trying to figure out why.
I was previously using this code just fine with 1.9.8
. Upgrading to 2.0+ has required adding transactions, altering some cypher syntax, the boot-time Spring config, and a small finite number of other changes.
The vast majority of the code remains the same, and is functionally correct as confirmed by unit and integration tests.
When the engine is booted, it is adding new nodes on a fairly constant basis. The output below shows a mysterious crash after 290 minutes.
This seems to happen always. Sometimes after 2 hours, sometimes after 5. It never happened at all with 1.9.8
.
The JVM is run with ./start-engine.sh > console.out 2>&1 &
.
The operative line of start-engine.sh
is
$JAVA_HOME/bin/java -server $JAVA_OPTIONS $JPROFILER_OPTIONS -cp '.:lib/*' package.engine.Main $*
Below is the last few lines of console.out
.
17437.902: RevokeBias [ 112 6 5 ] [ 20 6 27 43 26 ] 1
17438.020: RevokeBias [ 112 3 9 ] [ 5 0 5 0 0 ] 3
17438.338: GenCollectForAllocation [ 113 2 2 ] [ 1 0 11 4 32 ] 2
17438.857: BulkRevokeBias [ 112 3 13 ] [ 0 0 28 6 2 ] 3
./start-engine.sh: line 17: 19647 Killed $JAVA_HOME/bin/java -server $JAVA_OPTIONS $JPROFILER_OPTIONS -cp '.:lib/*' package.engine.Main $*
There is no stacktrace, and no other error output.
These are the last few lines of messages.log
from within /mnt/engine-data
2015-10-30 18:07:44.457+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Starting check pointing...
2015-10-30 18:07:44.458+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Starting store flush...
2015-10-30 18:07:44.564+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 845664650 to [/mnt/engine-data/neostore.counts.db.b], from [/mnt/engine-data/neostore.counts.db.a].
2015-10-30 18:07:44.565+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 845664650 to [/mnt/engine-data/neostore.counts.db.b], from [/mnt/engine-data/neostore.counts.db.a].
2015-10-30 18:07:44.834+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Store flush completed
2015-10-30 18:07:44.835+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Starting appending check point entry into the tx log...
2015-10-30 18:07:44.836+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Appending check point entry into the tx log completed
2015-10-30 18:07:44.836+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]: Check pointing completed
2015-10-30 18:07:44.836+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [35826]: Starting log pruning.
2015-10-30 18:07:44.844+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [35826]: Log pruning complete.
So everything looks fine up until the moment of crash, and the crash comes as a complete surprise.
There is plenty of other data in messages.log
, but I don't know what I'm looking for.
$ java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
$uname -a
Linux 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux