3

I have neo4j database with millions of nodes of type person, i want to remove specific attribute of all the person nodes. I tried to achieve it with the match query but it is scanning all nodes.

This is the query that i tried.

MATCH (p:Person)
REMOVE p.fatherName

Is there any other fast alternative to this query,

Community
  • 1
  • 1
Muhammad Adnan
  • 490
  • 2
  • 12
  • 25
  • This query should not scan all nodes (`AllNodesScan`), it should only get nodes with that label (`NodeByLabelScan`). If it is indeed scanning all nodes, please share the results of `EXPLAIN` or `PROFILE` (see http://stackoverflow.com/questions/20628663/is-there-a-way-to-show-cypher-execution-plan/31534005#31534005). – jjaderberg Sep 09 '15 at 08:24
  • yes you are correct about NodeByLabelScan but i only have person type of nodes and all nodes have this label. thats why i am saying it is scanning every node. – Muhammad Adnan Sep 09 '15 at 09:04
  • Well, there is no simpler way to declare that operation in cypher. Why is speed a problem in your case? Do you need to do perform this operation repeatedly or is it a one-off maintenance op while modelling? You would get better performance with the Java API, either by embedding the db or with a server extension, but you may also be fine just paging the query and running it repeatedly. – jjaderberg Sep 09 '15 at 10:27
  • Actually it is also taking lot of resources, and it consumes all the ram of my system, that is about 8gb and after about 1 hour it gives me the exception of gcc out of memory. that's why i was trying to figure out some doable solution. – Muhammad Adnan Sep 09 '15 at 13:12
  • In that case paging the query should help, MicTech added how to do that to his answer. Try a limit between 10k and 100k to find one that works, then run the query until it shows no properties removed. – jjaderberg Sep 09 '15 at 13:28

1 Answers1

5

There is no way to improve performance of that query thru Cypher.

You can try to avoid Nodes without fatherName property

MATCH (p:Person)
WHERE HAS (p.fatherName)
REMOVE p.fatherName

Also what could help is to add LIMIT and run query multipletimes

MATCH (p:Person)
WHERE HAS (p.fatherName)
WITH p LIMIT 100000
REMOVE p.fatherName

I suggest you to write Unmanaged Extension for removing that property.

e.g.

Label nodeLabel = DynamicLabel.label("Person");
String propertyName = "fatherName";

try(Transaction tx = database.beginTx()) {
    final ResourceIterator<Node> people = database.findNodes(nodeLabel);

    for (Node person : IteratorUtil.asCollection(people)) {
        if (person.hasProperty(propertyName)) {
            person.removeProperty(propertyName);
        }
    }

    tx.success();
}
MicTech
  • 42,457
  • 14
  • 62
  • 79
  • +1 but if it's a one-off operation, it may be easier to do it in Java embedded. And if you leverage a server extension, maybe it could at least be generalized to work for any property/label pair? – jjaderberg Sep 09 '15 at 13:11