3

I have a complex question about Neo4j and what Traversal can do.

Imagine you have the following Neo4j graph

Graph

My idea is to traverse the whole graph, and if I find a 'false' node, expand this status to his neighbours and so on, and finally in this example we will have all nodes with a 'false' status. (In real life, I have more conditions to set this status to true or false while traversing, but I simplified it a bit for the question)

I think I need some backtracking algorithm to do this, but in Neo4j I don't know how to do this, or if is it even possible. In addition, this graph could be a very huge graph.

How would yo do this with Java and Neo4j?

Thanks.

Bruno Peres
  • 15,845
  • 5
  • 53
  • 89
jpadilladev
  • 1,756
  • 4
  • 16
  • 23
  • 1
    Would it be enough to match to any node with the desired property as 'false', then change all reachable connected nodes from that one to also be false? – InverseFalcon Nov 08 '17 at 18:44

2 Answers2

2

For efficient matching to reachable nodes, there are two options that tend to work well.

With Neo4j 3.2.x, there is an efficient means to match to all distinct reachable nodes through a variable relationship match plus usage of DISTINCT, but it requires an upper bound on the variable-length relationship. Using a suitably high number should work. Something like:

MATCH (:SomeLabel{someProperty:false})-[*..999999]->(x)
WITH distinct x
SET x.someProperty = false

Otherwise, APOC Procedures offers apoc.path.subgraphNodes() which also does efficient matching to reachable nodes in a subgraph:

MATCH (start:SomeLabel{someProperty:false})
CALL apoc.path.subgraphNodes(start, {}) YIELD node
SET node.someProperty = false;

EDIT

To add more detail for the first option (why not just use *, and why use DISTINCT), keep in mind that by default Cypher will match to all possible paths when we use *, even if those paths end at the same node as a previously matched path. This can become inefficient in a sufficiently connected graph (when we don't have a reasonable upper bound, and we're not using LIMIT), with the possibility of blowing your heap or hanging indefinitely.

This is especially to be avoided when we aren't interested in all possible paths, just all possible nodes that are reachable.

In Neo4j 3.2, an optimization was introduced called pruning-var expand, which changes the traversal logic in the case when all of the following are true:

  1. We have a var-length expansion
  2. We aren't referencing the path in any way (such as by setting a path variable to the match pattern, or setting a variable on the var-length relationship)
  3. We have an upper-bound on the var-length expansion
  4. We ask for DISTINCT nodes or values obtainable from the expansion

Basically when the query is such that it is clear that we want distinct nodes (or values from distinct nodes) from a var-length expansion and don't care about the paths.

The planner will then use the pruning var expand (you can confirm by checking the query plan from EXPLAIN or PROFILE) to efficiently match to reachable nodes.

InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • With only the * will work (Bruno Peres answer and [this answer](https://stackoverflow.com/a/26799022/3133256) ). Why the DISTINCT? – jpadilladev Nov 09 '17 at 11:01
  • Added some more detail on the limitations of using `*` on a larger graph, and on the pruning var expand optimization. – InverseFalcon Nov 09 '17 at 11:18
  • Awesome explanation! – Bruno Peres Nov 09 '17 at 11:30
  • Great explanation! @InverseFalcon Can I have some problems If a reference the way in my example? As far as I know, Neo4j recommends to make the relationships directed and then ignore this direction while traversing. In my example, I can have [C] -> [F] Will your query update the 'C' property to false? – jpadilladev Nov 09 '17 at 11:34
  • If direction doesn't matter during traversal, then simply omit the arrow. That will traverse both incoming and outgoing relationships. – InverseFalcon Nov 09 '17 at 11:37
  • I made a test with your first query without the arrow in a Graph with 726 nodes. The query runs infinitely... – jpadilladev Nov 09 '17 at 12:52
  • @jpadilladev I can also replicate a hang. I'll raise that with engineering. Try out the APOC solution, in this case. Start out with just returning the count of distinct nodes, to see how that performs first, and how many nodes you'll be writing to. – InverseFalcon Nov 09 '17 at 18:56
0

I don't know if I understood your question completely, but I believe that a simple Cypher query is enough. Something like:

MATCH ({someProperty : false})-[*]-(neighbours)
SET neighbours.someProperty = false
Bruno Peres
  • 15,845
  • 5
  • 53
  • 89
  • This will expand the false property in any depth, so I think it would work. I'll need to add some conditions while expanding, but I will give it a try. In my question I asked to do this with Java (sorry If I didn't explain myself, but I wanted to do this with TraversalDescription, Expanders...). Why Cypher? It's easier? Thanks – jpadilladev Nov 09 '17 at 10:57
  • @jpadilladev Cypher is the "natural" choice when working with Neo4j since it is the query language do handle this database. And yes, Cypher is easier to work when compared with the Java API. However the Java API provide more flexibility and less abstraction. If I completely understood your requirement, you are trying to add conditions dynamically while the transversing is done... In this case I believe you will need to work with the Java API. – Bruno Peres Nov 09 '17 at 11:11