1

So I'm generating a rather large Neo4J instance, and I'm loading this using LOAD_CSV. I've read as many blogs as I could find about optimising queries to remove Eager reads from the query plan, but I've come against an example that I can't explain.

Say we have two types of node, A & B. Each has a unique property, name, with a constraint on it:

CREATE CONSTRAINT ON (a:A) ASSERT a.name IS UNIQUE
CREATE CONSTRAINT ON (b:B) ASSERT b.name IS UNIQUE

Now, if we wish to create a relationship between nodes of either type, like so:

MERGE (a:A {name:1})
MERGE (b:B {name:2})
MERGE (a)-[:REL]->(b)

Our query plan comes up with no eagers in at all.

However, if we want to create a relationship between 2 nodes of the same type:

MERGE (a:A {name:1})
MERGE (b:A {name:2})
MERGE (a)-[:REL]->(b)

the profile comes back with an eager read in it!

We may get rid of the eagerness by changing both of the node merges to matches, but this opens us up to the possibility of not creating the relationship we want to create!

My question is why does this happen for this specific case of creating a relationship between the two nodes of the same label?

I discovered this on Neo 2.3.2.

greg_data
  • 2,247
  • 13
  • 20

2 Answers2

3

This is expected behavior, as MERGE is MATCH or CREATE.

So if you are merging on the same label, the first MERGE will create nodes that the second MERGE needs to see to work correctly. That's why the first one is turned into an eager operation.

In general eagerness happens when a preceding operations modifies the graphs so that subsequent MATCH operations are affected by it.

It is totally unrelated to your relationship creation, i.e. it would also happen without the relationship MERGE.

Usually it is helpful anyway to split up these operations into multiple passes.

Then you can also minimize the number of rows that have to considered for each operation and so the # of index lookups.

WITH distinct row.column as col 
MERGE (:Lable {id:col}) ...
Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Cheers for your answer Michael. This clarifies things. What I was unclear about was why it only seemed to occur when the labels of the two were the same. My concern was that using Matches might result in edges not being created because one of the nodes might not have been present. I've sorted this by enforcing a particular order to the csv uploads. Thanks again! – greg_data Mar 17 '16 at 11:31
1

A potential workaround: process the csv file 3 times:

  1. LOAD CSV WITH HEADERS FROM <whateverurl> AS line MERGE (a:A {name:line.column1})
  2. LOAD CSV WITH HEADERS FROM <whateverurl> AS line MERGE (a:A {name:line.column2})
  3. LOAD CSV WITH HEADERS FROM <whateverurl> AS line MATCH (a:A {name:line.column1}) MATCH (b:A {name:line.column2}) MERGE (a)-[:REL]->(b)

This one should be "eager free".

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97