2

Objective: Find the minimum and maximum relationship lengths between two Node Types.

Example: Following are the dummy connections for a Node type 'T'

  • (Aplha)-->(Bravo)
  • (Bravo)-->(Charlie)

So the minimum hops to reach two nodes is 1 (i.e. Aplha is directly linked to Charlie), and the maximum hops to reach two nodes is 2 (i.e. (Aplha)--(Beta)--(Charlie)).

Cypher Query I have is like:

MATCH (from:T), (to:T), p=shortestPath((from)-[*]-(to))
RETURN min(length(p)) as min, max(length(p)) as max

Which works fine for smaller data-sets but for 2000 nodes is 'from' and 2000 nodes is 'to' connected with a level like min:5 and max:10 hops, this query takes like 30mins to run.

Is there any way to achieve this operations is a faster way?

Solutions I CANNOT use:

  • Limit relationship length: I have to use (from)-[*]-(to), cannot limit it to 3 or 4 levels.
Srinath Ganesh
  • 2,496
  • 2
  • 30
  • 60

1 Answers1

4

Minimum is easy enough using APOC's path expander procedures (only the latest winter 2018 release for either 3.2.x or 3.3.x).

You can use one group as your start nodes, and use the :T label in the label filter as the termination label (the end of the path for expansion) and add a limit:

MATCH (start:T)
WITH collect(start) as startNodes
CALL apoc.path.expandConfig(startNodes, {labelFilter:'/T', limit:1, minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN length(path) as min

We're using expandConfig() and NODE_GLOBAL uniqueness, which drastically helps out during expansion as we can prune (don't need to consider) any paths that end at a node that has already been visited.

The path expander procedures are great when you're looking for paths to nodes with certain labels, note that we don't need to match to end nodes and create cross products, we will evaluate labels of nodes during expansion, and stop when a node with the :T label is reached.

The limit:1 will automatically stop when the first result is found, and the expansion uses breadth-first-search, so the first match will be the shortest path possible.

For finding the longest of ALL the shortest paths (from each :T node to just its nearest :T node), the approach will be similar, but we will not collect the results, so the procedure will execute for every single :T node.

MATCH (start:T)
CALL apoc.path.expandConfig(start, {labelFilter:'/T', limit:1, minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN max(length(path)) as maxShortest

For finding the longest shortest-path between every two :T nodes, however, is likely to perform worse.

We can use a similar approach, but we'll remove the LIMIT, and change the label filter to use :T as an end node (paths must end at :T nodes, but can expand past them to find paths to other :T nodes)

MATCH (start:T)
CALL apoc.path.expandConfig(start, {labelFilter:'>T', minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN max(length(path)) as maxShortestOfAllPairs
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • is APOC query uni-directional? eg. Cypher (a)-[]->(b) is uni-directional and (a)-[]-(b) is bi-directional? – Srinath Ganesh Mar 09 '18 at 02:58
  • APOC's path expanders expand out from the start node. It can traverse relationships going either direction, or you can leave off the direction, meaning it will traverse the relationship regardless of the direction. – InverseFalcon Mar 09 '18 at 03:50
  • Thanks a lot, works fast and accurate. just had to remove "uniqueness:'NODE_GLOBAL" from the minimum query to get results, i dont know why. but it works after that – Srinath Ganesh Mar 09 '18 at 04:07
  • There is a case where in 50 nodes are connected by min:5 and max:10 hops. With APOC query the server is crashing (all RAM used up), while the cypher query runs fine. (my ram is quite low ie. 4GB but it works with basic Cypher query without APOC). Any tips for that? (there is nothing in the log, because i have to kill the task else its constantly running for a long time) – Srinath Ganesh Mar 09 '18 at 06:11
  • Which one of the queries is crashing, or is it all of them? I'd expect there to be trouble with that last one. – InverseFalcon Mar 09 '18 at 09:06
  • they work great, some node types in my sample data does work with the minimum length query (1st) and I had to kill the task... working out on a way,,, almost close – Srinath Ganesh Mar 09 '18 at 10:33
  • I modified the last query to do "RETURN min(length(path)) as min, max(length(path)) as max" and it works cool for all my data. Thanks :D you made APOC look simple (APOC has too few documentation online) – Srinath Ganesh Mar 09 '18 at 10:48