0

I'm trying to use Neo4j's algo.beta.louvain(). I noticed that it returns results that are quite different (# of communities and # of nodes in each community) if I order the nodes in label differently. The following 3 calls return different results. And I'm using {concurrency: 1}. Is there something I'm not getting?

CALL algo.louvain.stream('MATCH (n:Node) WHERE <some-condition> RETURN id(n) as id order by n.id desc', <relationship>, <config>)
CALL algo.louvain.stream('MATCH (n:Node) WHERE <some-condition> RETURN id(n) as id order by n.id asc', <relationship>, <config>)
CALL algo.louvain.stream('MATCH (n:Node) WHERE <some-condition> RETURN id(n) as id order by id(n) desc', <relationship>, <config>)

In the same Neo4j instance, the above is what I noticed which relates to my actual problem. What I'm really trying to do is to get consistent results for different Neo4j instances with the same data. In debugging why the results are different in different instances, I noticed that I can reproduce a related problem by giving same set nodes in different order. I didn't have the "order by" in the original code but I bet in different instances of Neo4j, the natural ordering is different causing the different results.

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
breakingduck
  • 85
  • 2
  • 8

1 Answers1

1

The algorithm itself is non-deterministic, which means there is no guarantee to have the same results on the same data.

If you need to run the algorithm incrementally ( for eg after adding new nodes and recomputing the clusters but without losing the cluster ids on the nodes already processed by the algorithm ), you can provide a seed property.

More informations in the documentation of the neo4j Graph Data Science library : https://neo4j.com/docs/graph-data-science/current/algorithms/louvain/#algorithms-louvain-examples-stream-seeded

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
  • Thanks. I get the non-deterministic part. However the algorithm produces the same exact result if the ordering of the nodes stay the same. It's only different when the order changes. Is there a way to give it a hint to produce larger clusters (for example)? Is the GDS implementation different? I guess we are evaluating if it is worthwhile to move to Neo4j 4.x. – breakingduck Dec 22 '20 at 13:57