0

I have a graph which consist of millions of disconnected subgraph. Now I am trying to find the number of nodes for all of these subgraphs. For example, lets say I have this graph which contains A-B-C, D-E, F-G-H. So the return will be 3, 2, 3.

Now I am being able to do that using the following query:

MATCH (n) CALL apoc.path.subgraphNodes(n, {}) YIELD node WITH n , count(node) as nodesnum return nodesnum

However it is incredibly slow and not at all suitable for a graph with millions of nodes therefore I would like to know if this can be done in a much faster way.

sjishan
  • 3,392
  • 9
  • 29
  • 53
  • If you can not target specific nodes ... this is always going to be slow as you are reading the whole database. I do wonder why you would expect there to be "another and faster way" ? By the way ... your example query above will return ... 3, 3, 3, 2, 2, 3, 3, 3 ... it counts the subgraph for EACH node. – Tom Geudens Jul 25 '17 at 07:25
  • @TomGeudens yes you are right. I changed the query to add "graph coloring" that is marking the nodes in a subgraph if they are already visited. but still it does not help as marking takes some time. – sjishan Jul 25 '17 at 14:50
  • Fair enough. It will indeed not make a lot of difference. You are not targetting (and I don't see how you could given your model) and thus are walking at least the whole database once. – Tom Geudens Jul 25 '17 at 15:30

2 Answers2

0

You can use size() like this

MATCH (n) return n.id, size((n)-[*]-()) limit 100

This uses projection instead of calling all the nodes into memory and is loads faster! Note that as stated previously for your example the first subgraph would get A,3 B,3 C,3 etc.

Ben Squire
  • 155
  • 2
  • 10
0

You may want to look at using the Neo4j Graph Algorithms library, as the connected components procedures may do what you want:

The Connected Components, or Union Find, algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set.

There are several ways to use this, from streaming the results to writing out the partition property to your nodes for later usage.

Here's an example of streaming, returning an id for the set and the count of nodes for the set, with no restrictions on the labels of nodes or types of relationships:

CALL algo.unionFind.stream('', '', {})
YIELD nodeId,setId
RETURN setId, count(nodeId) as count
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51