1

I'm having a strange issue in Neo4j. I have been recently working with the GraphAware TimeTree and populating it was going well until yesterday. I had to rebuild the tree because of an error I had made so I left a script running overnight (nohup).

Coming back today, I found that my script only ran for 3 minutes!

$ ps aux | grep timetreepop
root     21840  0.0  0.0 195952  2816 ?        S    Jul28   0:00 sudo nohup python timetreepop.py
root     21841  0.0  0.2 381416 75016 ?        S    Jul28   0:03 python timetreepop.py

I noticed this behavior while I was working, but I figured leaving it overnight while I'm not active would be help. I also turned off my other java server process in case of possible contention. At this point my server would only be running a python tornado server in the background which isn't very hefty, and doesn't get much traffic (couple hits a day).

All in all there is plenty of available RAM in my system, the CPUs are not being utilized elsewhere, and there are no other processes running heavy IO on my machine. Using top / atop show a healthy system with available resources.

Here is a high level of what my script is doing:

neo = neopop.handler()
for i, meta_id in enumerate(meta_gen(ship='KAOU')):
   neo.populate_timetree(record=meta_id)

My handler creates the driver and session in the __init__ constructor:

self.driver = graphdb.driver(config.NEO4J_HTTP_LINK)
self.session = self.driver.session()

My generator provides meta_id values which are unique property values in the nodes in my graph.

The populate_timetree() function creates the following statement:

MATCH (r:record {meta:"KAQP_20120101v20001_0001"}) WITH r 
CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
YIELD node RETURN node.meta;

Everything was working fine my first go around. After messing up my time values, I deleted the database, restarted, and tried again. Only this time, my program freezes when I make the call to close the session:

neo.session.close()

Note: I actually call this in my __del__ deconstructor, (which I know is probably considered bad practice but it has been working for me so far and suits my needs.)

I have double checked all my code for rogue readline statements / anything that could cause it to be pausing. Also recompiled my package with all this code in it. I know for a fact it is getting stuck at this session.close() statement.

So I tried playing with the Neo4j-shell tool to see if anything was different. First a quick responsiveness check:

$ neoc 'schema'
Indexes
  ON :record(meta) ONLINE (for uniqueness constraint) 

Constraints
  ON (record:record) ASSERT record.meta IS UNIQUE

Ok all good. Now I try the timetree call for a single value:

$ ./neo4j-shell -c '
> MATCH (r:record {meta:"KAOU_20110613v20001_0000"}) WITH r 
> CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
> YIELD node RETURN node.meta;
> '

BOOM. Neo4j is stuck! To be clear, I know the MATCH statement is not taking forever here, because I only put in about 2 million nodes into my db this time, and calling the match statement alone runs absolutely fine. I even have an index set up for this unique property (see schema call above).

So what is going on here? I figure the initial creation of the tree shouldn't be too much of an issue if I am only inserting a single node. My first try seem to run flawlessly. I'm not sure what I have done different this time except populate my database with 2 million out of the 58 million records I had. (So it should be way faster).

I actually left this command running for a few hours, and it also had only run for a few minutes on my system. I am so confused as to what is happening here. Does anyone have any ideas? Is there anyway I can see what neo4j is actively doing about this command? (Keep in mind I am using community edition)

I am running version 3.0.3 of neo4j and, 3.0.3.39 timetree / graphaware on and CentOS 7 server.

The only idea I have, is that I have been calling cypher statements and cancelling them before they commit so many times, whether it be through python or the cmd line shell tool. Is it possible I am messing with the transaction manager too much by cancelling big transactions before they finish?

For example:

/neo4j-shell -c 'MATCH (r:record) CALL ga.timetree.events.attach(....) ....'

And then hit control-C after it runs for about 2 hours without finishing.


UPDATE:

Ok so I investigated my log files and found some issues. It seems my threads were being blocked due to not having enough memory. I guess my max heap size was being artifically limited and not using the available resources on my machine. (Maybe??). So I manually set dbms.memory.heap.max_size=16000.

The problem seems to go away now! I guess I was confused, since I would expect a Java OOM to appear in the response of the neo4j-shell tool instead of idling as if making progress.

spanishgum
  • 1,030
  • 1
  • 14
  • 29

1 Answers1

1
$ ./neo4j-shell -c '
> MATCH (r:record {meta:"KAOU_20110613v20001_0000"}) WITH r 
> CALL ga.timetree.events.attach({node: r, time: r.time, relationshipType: "observedOn", resolution:"Minute"})
> YIELD node RETURN node.meta;
> '

How many nodes is this query expected to attach to the timetree ?

This procedure doesn't use batching so it is limited to what your memory can do in a single transaction (this is now on our backlog).

That said, it generally doesn't make sense to attach manually the events to the tree, this procedure is there for convenience but we recommend to use automatic event attachments :

https://github.com/graphaware/neo4j-timetree#automatic-event-attachment

In that way, re-attaching the complete db (or only the nodes configured for attachment) will be a matter of re-starting your database and this process will run with batch transactions, which will speed up significantly the time taken.

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
  • This matches exactly 1 or 0 nodes. In this specific case it matches 1 because I entered the node with this `meta_id` into the db. I was making this call programmatically 1 by 1 because trying to match my entire set of records was taking way too long, and I am only testing right now. I did come across the automatic attachment, but haven't looked at it yet. I suppose that would be a much better alternative. Also - I think I just fixed my problem - see above. – spanishgum Jul 29 '16 at 18:02
  • ok. Are you able to share your db with us ? Any logs also you can share from the db? – Christophe Willemsen Jul 29 '16 at 18:03
  • Cool, glad to hear ! – Christophe Willemsen Jul 29 '16 at 18:11
  • Absolutely. thanks for the response anyway! As a side note, if I did want to share the database / log files, what is the best way to do that on here (Given that they are large)? – spanishgum Jul 29 '16 at 18:13
  • Zip and put it somewhere on dropbox / s3 then share via private link – Christophe Willemsen Jul 29 '16 at 18:31