Cypher restrictions on queries chained after a load csv

Question

I'm currently importing some relationships in my graph using bolt driver in .net. I wanted to try the load csv command for this case (source is in csv) and compare performence but the query is only applied to the first row. I tested with a skip n limit 1 and only managed to make it run row by row.

I'm thus wondering if there are any restriction on "complex" queries in a load csv loop?

Here is the query :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->() 
WHERE row[2] in labels(o)
  WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1    
  WITH n, last(nodes(p)) as m
MERGE (n)-[r:Rel]->(m);

Thanks!

Edit :

My csv is just regular 3 columns CSV following this patern :

IDTEXT0000000001;V150;LabelOne
IDTEXT0000000002;M245;LabelOne
IDTEXT0000000003;D666;Labeltwo
etc.

By row by row I mean that I first tested with a limit 50 after with row and as it did not work (nothing added) I then did limit 1, skip 1 limit 1, `skip 2 limit 2, etc. The "row by row" method works but you'll admit that it's not really what you wanna do.

Final code :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
MATCH p=(o:Target {num:row[1]})-[:Version*..]->() 
WHERE row[2] in labels(o)
WITH n, p ORDER BY LENGTH(p) DESC    
WITH n, last(nodes(collect(p)[0])) as m
MERGE (n)-[r:Rel]->(m);

And with apoc (slightly faster) :

using periodic commit
LOAD CSV  FROM "file:///path/to/my/file.csv" AS row fieldterminator ';' 
with row
MATCH (n:Source {id:row[0]})
call apoc.cypher.run('MATCH p=(o:Article {num:$num})-[:VersionChristopher*0..]->() WHERE $label in labels(o) WITH p ORDER BY LENGTH(p) DESC LIMIT 1 return last(nodes(p)) as m', {num:row[1], label:row[2]})
yield value
with n, value.m as m
MERGE (n)-[r:Rel]->(m);

But using bolt allows me to build a query without the label test and is still 3 to 4 times faster than with load csv. Thanks for helping :)

Also, what do the phrases "applied to the first row" and "run row by row" mean? Do you mean that the `MERGE` is always creating a singe new relationship? — cybersam, Jul 31 '17 at 17:54

score 1 · Accepted Answer · answered Jul 31 '17 at 21:27

1

The problem is in your use of LIMIT within the query:

WITH n, p ORDER BY LENGTH(p) DESC LIMIT 1

This doesn't limit on a per-row basis, LIMIT applies to ALL rows. Where you had multiple rows of each n (from your CSV) and multiple p paths, after this limit is applied, you only have a single row, one n, one p, and subsequently, a single MERGE operation.

You should read up on how to limit results per row, once you fix that your query should be fine.

answered Jul 31 '17 at 21:27

InverseFalcon

29,576
4
38
51

Ok, nice catch, I'll change this part. I was using it to take the last node of longest path. Makes sense it was working fine when firing queries with bolt one by one but not in csv import. I'll tag as answer tomorrow after confirming it. – Pierre Jeannet Jul 31 '17 at 21:43

Cypher restrictions on queries chained after a load csv

Edit :

Final code :

1 Answers1