1

I have a data model like this:

  • Person node
  • Email node
  • OWNS relationship
  • LISTS relationship
  • KNOWS relationship

each Person can OWN one Email and LISTS multiple Emails (like a contact list, 200 contacts is assumed per Person).

The query I am trying to perform is finding all the Persons that OWN an Email that a Contact LISTS and create a KNOWS relationship between them.

MATCH (n:Person {uid:'123'}) -[r1:LISTS]-> (m:Email) <-[r2:OWNS]- (l:Person) CREATE UNIQUE (n)-[:KNOWS]->[l]

The counts of my current database is as follows:

  • Number of Person nodes: 10948
  • Number of Email nodes: 1951481
  • Number of OWNS rels: 21882
  • Number of LISTS rels: 4376340 (Each Person has 200 unique LISTS rels)

Now my problem is that running the said query on this current database takes something between 4.3 to 4.8 seconds which is unacceptable for my need. I wanted to know if this is normal timing considering my data model or am I doing something wrong with the query (or even model).

Any help would be much appreciated. Also if this is normal for Neo4j please feel free to suggest other graph databases that can handle this kind of model better.

Thank you very much in advance

UPDATE:

My query is: profile match (n: {uid: '4692'}) -[:LISTS]-> (:Email) <-[:OWNS]- (l) create unique (n)-[r:KNOWS]->(l)

The PROFILE command on my query returns this:

Cypher version: CYPHER 2.2, planner: RULE. 3919222 total db hits in 2713 ms.

enter image description here

Mepla
  • 438
  • 4
  • 16
  • Do you have an index on the uid property for the Person nodes? `CREATE INDEX ON :Person(uid)`. Also, perhaps you could profile the query and add the results to your question (see [1](http://neo4j.com/docs/stable/how-do-i-profile-a-query.html), [2](http://stackoverflow.com/questions/20628663/is-there-a-way-to-show-cypher-execution-plan/31534005#31534005)). – jjaderberg Sep 09 '15 at 08:30
  • Also, you will want to use `MERGE` instead of `CREATE UNIQUE`. – jjaderberg Sep 09 '15 at 08:31
  • py2neo uses REST interface of neo4j, you should try with https://github.com/neo4j-contrib/python-embedded, embedded Neo4j is way faster. – Supamiu Sep 09 '15 at 08:46
  • @Supamiu Well I also have the same result in the web interface, but that also sends HTTP requests as far as I know (a POST to transaction collection and then a commit to the specific transaction) right? – Mepla Sep 09 '15 at 08:55
  • Yes web interface is using HTTP too. Did you tried using MERGE instead of CREATE UNIQUE? I'm curious to see the result – Supamiu Sep 09 '15 at 08:56
  • @jjaderberg Yes, I have an index on uid property of Person and on email property of Email nodes. Also I just tried MERGE instead of create unique but it doesn't seem to have much effect on the time of execution (still ~4.5s), should I do anything else in conjunction with MERGE? (I usually use MERGE to create unique nodes) – Mepla Sep 09 '15 at 08:59
  • @Suparmiu `match (n:Person {uid: '4692'}) -[:LISTS]-> (m:Email) <-[:OWNS]- (l:Person) create unique (n)-[r:KNOWS]->(l)` -> Returned 0 rows in 4405 ms. – Mepla Sep 09 '15 at 09:01
  • Can you update your question with the execution plan that you get with `PROFILE`, either a screenshot of the visual plan or profile the query in `neo4j-shell` for the plan as an ascii table. – jjaderberg Sep 09 '15 at 10:18
  • @jjaderberg I did, ty for your patience :) – Mepla Sep 09 '15 at 10:34
  • You are missing the `:Person` label in the query you profiled. Is that missing from the query or was that a typo when updating your query? Without the label the query planner cannot use your index. – jjaderberg Sep 09 '15 at 10:48
  • @jjaderberg Yes, to my odd observation, when I have the `n:Contact {uid: '4692'}` the running time is increased (4393 ms.), but when I run it without the Contact label it runs faster (~2900 ms.) (here: http://i.imgur.com/70ec0BF.png) – Mepla Sep 09 '15 at 11:15
  • I'm afraid you've lost me, where does `:Contact` label come from? – jjaderberg Sep 09 '15 at 11:36
  • @jjaderberg Oh, sorry, Contact is equivalent to Person, I had 2 series of naming. `profile match (n:Person {uid: '4692'}) -[:LISTS]-> (:Email) <-[:OWNS]- (l) create unique (n)-[r:KNOWS]->(l)` takes longer than `profile match (n {uid: '4692'}) -[:LISTS]-> (:Email) <-[:OWNS]- (l) create unique (n)-[r:KNOWS]->(l)`. That should not be the case right? since Matching `n:Person {uid: '4692'}` alone is much faster than `n {uid: '4692'}` – Mepla Sep 09 '15 at 11:42
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/89165/discussion-between-jjaderberg-and-mepla). – jjaderberg Sep 09 '15 at 11:52

1 Answers1

2

Yes, 4.5 seconds to match one person from index along with its <=100 listed email addresses and merging a relationship from user to the single owner of each email, is slow.

The first thing is to make sure you have an index for uid property on nodes with :Person label. Check your indices with SCHEMA command and if missing create such an index with CREATE INDEX ON :Person(uid).

Secondly, CREATE UNIQUE may or may not do the work fine, but you will want to use MERGE instead. CREATE UNIQUE is deprecated and though they are sometimes equivalent, the operation you want performed should be expressed with MERGE.

Thirdly, to find out why the query is slow you can profile it:

PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person) 
MERGE (n)-[:KNOWS]->[l]

See 1, 2 for details. You may also want to profile your query while forcing the use of one or other of the cost and rule based query planners to compare their plans.

CYPHER planner=cost
PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person) 
MERGE (n)-[:KNOWS]->[l]

With these you can hopefully find and correct the problem, or update your question with the information to help others help you find it.

Community
  • 1
  • 1
jjaderberg
  • 9,844
  • 34
  • 34
  • Thank you very much, the profile command actually helped me a lot understanding what is happening behind the scene (Execution Plans). Indexing uid plus tweaking my query based on Execution Plans knowledge reduces my query time to <100ms. – Mepla Sep 10 '15 at 06:57