4

I have a network (e.g. a water network) and I want to find topological structures : clusters (circular paths), bridges (relationships that connect clusters) and trees (the remaining).

network

The Cypher statement to create the example network is here.(https://www.dropbox.com/s/e1gtqxlm9ngaau5/Cypher%20to%20create%20example%20network.cql?dl=0) The blue relationships are the clusters I am looking for, the red ones the bridges and the green ones the trees.

To find the clusters, I have two approaches, both of which return the correct results. But both are far too slow.

Approach 1: Start from the relations and look if there is a second path between the start and end node. This one takes about 10M db hits

MATCH (n:WN)-[r:PIPE]->(m:WN) 
WHERE EXISTS((n)-[r]->(m)-[:PIPE*2..]-(n))
RETURN r

Approach 2: Start by looking for circular paths, ignoring directions. (about 12000) and then extract the unique relationships. This one takes about 20M db hits.

MATCH path=(n:WN)-[:PIPE*..]-(n)
RETURN 
     apoc.coll.subtract(
          apoc.coll.flatten(COLLECT(relationships(path))
          ),
          []
     )
    AS clusterRelationships

Is there a smarter approach, returning results faster?

Graphileon
  • 5,275
  • 3
  • 17
  • 31

1 Answers1

1

You could detect clusters with the Strongly connected component algorithm that is available in the GDS library. I think it fits your definition of a cluster, and it also would work on your example.

The Strongly Connected Components (SCC) algorithm finds sets of connected nodes in a directed graph where each node is reachable in both directions from any other node in the same set.

For detecting bridges you could use the Betweenness centrality algorithm to find potential bridge nodes, that have the bridge relationships connected to them. This would limit the number of edges you need to take into account when calculating which edges are the bridges. Unfortunately, this solution is not perfect as for some very small bridges, let's say they are a bridge to only a single or 2 nodes, the betweenness centrality won't be that high. And some nodes in the middle of the graph will have a high betweenness score because all the information would flow through them in theory.

I have another idea that would probably work quite fast. Run the Strongly connected component algorithm and store results back to Neo4j. Then try to find edges that connect different clusters of nodes. This will include both trees and bridges and then you have to decide which of the two options the relationship should be classified as.

Tomaž Bratanič
  • 6,319
  • 2
  • 18
  • 31
  • Hi , thanks for your comments. SCC only detects sets of nodes in which each pair of nodes is connected, which is not the case. From https://neo4j.com/docs/graph-data-science/current/algorithms/strongly-connected-components/: "The SCC algorithm finds maximal sets of connected nodes in a directed graph. A set is considered a strongly connected component if there is a directed path between each pair of nodes within the set. It is often used early in a graph analysis process to help us get an idea of how our graph is structured." Also, I need it to be undirected. – Graphileon Aug 31 '20 at 09:41
  • Once I have the clusters, finding the bridges is not a problem, as the picture shows (it is dynamically generated). – Graphileon Aug 31 '20 at 09:42