Find all unique longest paths with Cypher?

Question

This is unrelated to how to find all the longest paths with cypher query?, and somewhat related to the answer in Find all relationship disjoint longest paths in cypher/traversal API ordered by size except my condition is a bit different:

I would like to write a cypher query that returns paths whose collections of nodes are "unique", in the sense that no two nodes in the path share the same name property.

Here's an example graph:

and the cypher to make it:

CREATE (a1:Node {name: 'A', time:1}),
(c1:Node {name: 'C', time:2}),
(b1:Node {name: 'B', time:3}),
(d1:Node {name: 'D', time:4}),
(c2:Node {name: 'C', time:5}),
(a2:Node {name: 'A', time:6}),
(a3:Node {name: 'A', time:7}),
(b2:Node {name: 'B', time:8}),
(d2:Node {name: 'D', time:9})

CREATE (a1)-[:NEXT]->(b1)-[:NEXT]->(c2)-[:NEXT]->(a2)-[:NEXT]->(b2),
(a2)-[:NEXT]->(a3)-[:NEXT]->(d2),
(c1)-[:NEXT]->(b1),
(d1)-[:NEXT]->(c2)

RETURN *

In this graph, the following paths are considered longest, and unique:

(a1)-->(b1)-->(c2)
(c1)-->(b1)
(d1)-->(c2)-->(a2)-->(b2)
(a3)-->(d2)

Some example of paths that should not be returned from the query are

(c1)-->(b1)-->(c2) since it contains two instances of nodes with name, "C"
(a2)-->(b2) since it is contained in the larger path (d1)-->(c2)-->(a2)-->(b2)

Any help would be much appreciated.

stdob-- · Answer 1 · 2016-05-08T10:28:42.223

//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH path, nodes(path) as nodes, 
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path to the collection
WITH collect({f: id(HEAD(nodes)), // Fist node in path
              l: id(LAST(nodes)), // Last node in path
              ns: EXTRACT(n in nodes(path) | id(n)), 
              p: path
     }) + {f: NULL, l: NULL, ns: [NULL]} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  UNWIND RANGE(i+1,size(paths)-1) as j
    WITH i, paths, paths[i]['p'] as path, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['f']) )) as fc, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['l']) )) as fl
    WHERE fl=0 OR fc=0
//
// Return all appropriate ways
RETURN path ORDER BY length(path)

Upd: (let's try to add elegance)

//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH a, b, path,
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path and first and last nodes to the collection
WITH collect({first: a, last: b, path: path}) + {} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  WITH i, paths[i]['path'] as path, paths
    UNWIND RANGE(i+1,size(paths)-1) as j
      WITH path,
           collect(distinct paths[i]['first'] IN nodes(paths[j]['path'])) as cFirst,
           collect(distinct paths[i]['last']  IN nodes(paths[j]['path'])) as cLast
        WHERE (size(cFirst)=1 AND cFirst[0] = false) OR
              (size(cLast )=1 AND cLast [0] = false)
//
// Return all appropriate ways
RETURN path ORDER BY length(path)

So, this seems to work (result when run on my sample graph: http://pastebin.com/jxeKkr1f): I'm just wondering if 1) this is "idiomatic cypher", and 2) if there's a way to make that massive `WITH collect(...) as paths` statement more readable (or elegant)? Also, why is the `collect(DISTINCT id(node)) as uids` necessary? — StevieP, May 07 '16 at 22:08
@StevieP 1) I think that is quite typical, although check nesting paths in each other, I would have already suffered at the application side. 2) See the update - it became better, imho :)) 3) `collect(DISTINCT id(node)) as uids` - need to check that the path does not intersect with itself - if you are sure that it can not be - take away this part. — stdob--, May 07 '16 at 22:45
You can definitely assume that the graph is a DAG, so self-intersection shouldn't be a problem (the relations are time-based). Regarding the application layer: if you could perform this task in python (for example) what would you leave to Cypher vs. Python? — StevieP, May 08 '16 at 09:56
If DAG - can safely remove this part. At the application layer, I would have moved the second part (two-cycle UNWINW) - testing for sub-path. — stdob--, May 08 '16 at 10:13
so, use Cypher for fast path retrieval and coarse filtering and then do the more imperative stuff in app? Also, what is UNWINW? (Sorry for the newbie questions!) — StevieP, May 08 '16 at 10:16

Find all unique longest paths with Cypher?

1 Answers1