Using Unwind and Dumping Data in neo4j - Query Optimization

Question

I am doing batch insertion to insert data in neo4j but my transaction is taking huge time as my database is increasing continuously also.

In my project, For only one case ,I am having more then 18,000 records which are meant to be stored in db and will have relationships with a Target node. Each record will be stored as Friend Node

Relationships are like

Target_Node-[r:followed_by]->Friend_Node

Target_Node-[r:Friends_with]->Friend_Node

Target_Node-[r:Performs_Activity]->Friend_Node

My query executes for all the cases separately and the chances are very likely that there maybe all three relations between a Target and Friend Node.

I am sending 20 records per thread for a single insertion which unwinds over the array of records and checks if the records is already exists in Friend_Node or Target_Node, if not then create it as a Friend_Node and then assign relation to it; If the node already have relationship and a new relation is passed to the query then a new relation will also be added between the two nodes.

Also I do check in my query if a Record do have a Location property then I do create a Location Node and assign the relation with that also.

Note: create_rel variable can be Friends_with,Followed_by or Activity_p

My query is as follows

 """UNWIND [{id: "1235" , uid : "0"}] as user

    UNWIND """+ l +""" as c

    OPTIONAL MATCH (n:Target {id : c.id , uid : "0"})

    OPTIONAL MATCH (m:Friend {id : c.id , screen_name:c.screen_name, uid : "0"})

    WITH coalesce(n, m) as node,user,c // returns first non-null value

    CALL apoc.do.when(node is null, "MERGE (n:Friend {id:c.id, name:c.name, profile: c.profile, location:c.location, uid : user.uid}) RETURN n", '', {c:c,user:user}) YIELD value

    with coalesce(node, value.n) as y,user,c

    MERGE (u:Target {id: user.id , uid : user.uid})

    """+create_rel+"""

    foreach (sc in c.cityn | merge(cn:Location {location:sc.location, loc_lower : sc.loc_lower}) merge (y)-[:`located_at`]-(cn))

    """

Db sometimes gives TransientError error also.

Feedback is appreciated as I am a learner and will appreciate valuable suggestions.

Thanks in advance

score 0 · Answer 1 · answered Jul 09 '20 at 17:17

I think your main problem lies in how you merge and match the nodes. Ideally, you always want to have a unique identifier for nodes. I can see that Friend node has a property id, which I will assume is unique for every Friend and Target.

First, you want to create a unique constraint on that property:

CREATE CONSTRAINT ON (f:Friend) ASSERT f.id IS UNIQUE;
CREATE CONSTRAINT ON (f:Target) ASSERT f.id IS UNIQUE;

You want something similar for Location nodes as well. seems like you store both location value and the lowercase value of location, so any of them should be unique for each node.

CREATE CONSTRAINT ON (l:Location) ASSERT l.id IS UNIQUE;

Now you can optimize your query like this:

"""UNWIND [{id: "1235" , uid : "0"}] as user

    UNWIND """+ l +""" as c

    OPTIONAL MATCH (n:Target {id : c.id})

    OPTIONAL MATCH (m:Friend {id : c.id})

    WITH coalesce(n, m) as node,user,c // returns first non-null value

    CALL apoc.do.when(node is null,
    "MERGE (n:Friend {id:c.id}) 
     ON CREATE SET n+= {name:c.name, profile: c.profile, 
           location:c.location, uid : user.uid} 
     RETURN n", '', {c:c,user:user})
    YIELD value
    with coalesce(node, value.n) as y,user,c
    MERGE (u:Target {id: user.id , uid : user.uid})
    """+create_rel+"""

    foreach (sc in c.cityn | 
        merge(cn:Location {location:sc.location}) 
        ON CREATE SET cn.loc_lower = sc.loc_lower 
        merge (y)-[:`located_at`]-(cn))

    """

I am creating indexes on the basis of id but unique constraint is not possible in my scenario. The reason is that my software may have a Friend node exists in system and later on user want to change this to be a target so constraints check fails as I am replicating the node in this scenario and changing only user id in this case(if I need it to be else no need) .. Any other suggestion? — Mubeen Mubarik, Jul 13 '20 at 08:56

score 0 · Answer 2 · answered Jul 09 '20 at 19:20

You should avoid running multiple write queries (that can touch the same nodes and relationships) concurrently, as that could cause intermittent TransientErrors, as you have seen. (However, queries that cause transient errors can be retried.)
You should be passing user and l to your query as parameters, so that the Cypher planner will only need to compile the query once, and to make the query less prone to Cypher-injection attacks. (Also, there is no need to UNWIND a list that will always have just a single map -- you could have directly used the map via WITH {id: "1235" , uid : "0"} AS user. But, as I mentioned, you should just pass the user map as a parameter so you can efficiently change the user without forcing a recompilation.)
To avoid recompilation, you also need to need to make the create_rel string a constant string (so, it might as well be directly in your main query string). Again, you should also pass any variables needed by that as parameters.
You should create indexes (or uniqueness constraints) on :Target(id) and :Friend(id), to speed up your MATCH and MERGE clauses.
(a) MERGE (u:Target {id: user.id , uid : user.uid}) only needs to be executed once, not per c value. So, it should be executed before the UNWIND.

(b) Also, it is not strictly necessary for this query to create u, since nothing in the query uses it. So, instead of running this identical MERGE clause once per thread, you should consider taking it out and running it a separate standalone query.

Here is a query that combines suggestions #2 and #5a (but you will have to take care of the others yourself), along with some refactoring using pattern comprehension to avoid unnecessary DB hits:

MERGE (u:Target {id: $user.id, uid: $user.uid})
WITH u
UNWIND $l as c
WITH u, c, [(n:Target {id : c.id})-[*0]-()|n] AS nodeList
WITH u, c, CASE WHEN SIZE(nodeList) = 0 THEN [(n:Friend {id : c.id})-[*0]-()|n] ELSE nodeList END AS nodeList
CALL apoc.do.when(SIZE(nodeList) = 0, 'MERGE (n:Friend {id: c.id, name: c.name, profile: c.profile, location: c.location, uid: user.uid}) RETURN n', 'RETURN nodeList[0] AS n', {c:c,user:$user,nodeList:nodeList}) YIELD value
WITH u, c, value.n AS node
FOREACH (sc IN c.cityn | MERGE (cn:Location {location: sc.location, loc_lower: sc.loc_lower}) MERGE (node)-[:located_at]-(cn))

// Put your parameterized create_rel code here

thanks for very nice suggestions; its really helpful. could you please add create_rel parameterized code also in query as I am bit confused with assigning the relationship at this point. — Mubeen Mubarik, Jul 13 '20 at 11:38
I cannot add the appropriate parameterized `create_rel` code, because your question never showed the original `create_rel` code. I am just saying that you should try to NOT insert a separate `create_rel` string into this query -- you should just try to include whatever is in that string directly in this query. Without knowing what is in that string, I cannot tell you how to do that. And you may need to create a separate question if you need help with that eventually. This question is already way too complicated. — cybersam, Jul 15 '20 at 16:39

Using Unwind and Dumping Data in neo4j - Query Optimization

2 Answers2