0

I am currently having a performance issue on a Neo4J request.

Here is the problem. I need to find users in the database from a large list. To do this, the uniqCode must match, OR the name and location (zip) must match.

Then I want to be able to merge this user with a node I create.

The query below works but it takes between 20 and 30 seconds for a list of 30 users and on real case, it will be necessary to pass a list of 5000 to 10000 users.

I specify that I indexed the uniqCode and the name of the users nodes.

UNWIND $users as row
    MATCH (u:User)
    WHERE u.uniqCode = row.uniqCode
    OR (
        apoc.text.clean(u.name) = row.name
        AND EXISTS ((u)-[:IS]->(:Zip {name:row.zip}))
    )
    MERGE (u)<-[:IS]-(a:ParallelUser {id:row.uuid, name: u.name, uniqCode: row.uniqCode})
    RETURN {name: a.name, uniqCode: a.uniqCode, id: a.id} AS ParallelUser

with params look like

[{uniqCode: "1234", name: "John Doe", zip: "1234", uuid: "1234"}, ...]

Thank you in advance for your help...

1 Answers1

0

It would be good if you could use an index for the MATCH clause in your query. You can check to see if the query planner is using any indexes by running the query and prepending with PROFILE. Feel free to post the results back here for more detailed discussion.

The query tuning documentation might be helpful to you.

I found this free course enlightening.

You won't be able to use an index on u.name if you have to wrap it in the apox.text.clean() function. Can you run that function on the property before you store it, or else create a new cleanName property? Then you could create an index that includes that property.

On the MERGE portion of your query, I wonder if all three properties of ParallelUser are required to uniquely identify the node? If the id alone is sufficient, then you can rewrite the MERGE portion this way:

MERGE (u)<-[:IS]-(a:ParallelUser {id:row.uuid})
SET a.name = u.name, a.uniqCode = row.uniqCode
Nathan Smith
  • 881
  • 4
  • 6