I'm having a strange issue in Neo4j. I have been recently working with the GraphAware TimeTree and populating it was going well until yesterday. I had to rebuild the tree because of an error I had made so I left a script running overnight (nohup).
Coming back today, I found that my script only ran for 3 minutes!
$ ps aux | grep timetreepop
root 21840 0.0 0.0 195952 2816 ? S Jul28 0:00 sudo nohup python timetreepop.py
root 21841 0.0 0.2 381416 75016 ? S Jul28 0:03 python timetreepop.py
I noticed this behavior while I was working, but I figured leaving it overnight while I'm not active would be help. I also turned off my other java server process in case of possible contention. At this point my server would only be running a python tornado server in the background which isn't very hefty, and doesn't get much traffic (couple hits a day).
All in all there is plenty of available RAM in my system, the CPUs are not being utilized elsewhere, and there are no other processes running heavy IO on my machine. Using top
/ atop
show a healthy system with available resources.
Here is a high level of what my script is doing:
neo = neopop.handler()
for i, meta_id in enumerate(meta_gen(ship='KAOU')):
neo.populate_timetree(record=meta_id)
My handler creates the driver and session in the __init__
constructor:
self.driver = graphdb.driver(config.NEO4J_HTTP_LINK)
self.session = self.driver.session()
My generator provides meta_id
values which are unique property values in the nodes in my graph.
The populate_timetree()
function creates the following statement:
MATCH (r:record {meta:"KAQP_20120101v20001_0001"}) WITH r
CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
YIELD node RETURN node.meta;
Everything was working fine my first go around. After messing up my time values, I deleted the database, restarted, and tried again. Only this time, my program freezes when I make the call to close the session:
neo.session.close()
Note: I actually call this in my __del__
deconstructor, (which I know is probably considered bad practice but it has been working for me so far and suits my needs.)
I have double checked all my code for rogue readline statements / anything that could cause it to be pausing. Also recompiled my package with all this code in it. I know for a fact it is getting stuck at this session.close()
statement.
So I tried playing with the Neo4j-shell tool to see if anything was different. First a quick responsiveness check:
$ neoc 'schema'
Indexes
ON :record(meta) ONLINE (for uniqueness constraint)
Constraints
ON (record:record) ASSERT record.meta IS UNIQUE
Ok all good. Now I try the timetree call for a single value:
$ ./neo4j-shell -c '
> MATCH (r:record {meta:"KAOU_20110613v20001_0000"}) WITH r
> CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
> YIELD node RETURN node.meta;
> '
BOOM. Neo4j is stuck! To be clear, I know the MATCH statement is not taking forever here, because I only put in about 2 million nodes into my db this time, and calling the match statement alone runs absolutely fine. I even have an index set up for this unique property (see schema call above).
So what is going on here? I figure the initial creation of the tree shouldn't be too much of an issue if I am only inserting a single node. My first try seem to run flawlessly. I'm not sure what I have done different this time except populate my database with 2 million out of the 58 million records I had. (So it should be way faster).
I actually left this command running for a few hours, and it also had only run for a few minutes on my system. I am so confused as to what is happening here. Does anyone have any ideas? Is there anyway I can see what neo4j is actively doing about this command? (Keep in mind I am using community edition)
I am running version 3.0.3 of neo4j and, 3.0.3.39 timetree / graphaware on and CentOS 7 server.
The only idea I have, is that I have been calling cypher statements and cancelling them before they commit so many times, whether it be through python or the cmd line shell tool. Is it possible I am messing with the transaction manager too much by cancelling big transactions before they finish?
For example:
/neo4j-shell -c 'MATCH (r:record) CALL ga.timetree.events.attach(....) ....'
And then hit control-C
after it runs for about 2 hours without finishing.
UPDATE:
Ok so I investigated my log files and found some issues. It seems my threads were being blocked due to not having enough memory. I guess my max heap size was being artifically limited and not using the available resources on my machine. (Maybe??). So I manually set dbms.memory.heap.max_size=16000
.
The problem seems to go away now! I guess I was confused, since I would expect a Java OOM to appear in the response of the neo4j-shell
tool instead of idling as if making progress.