2

This is unrelated to how to find all the longest paths with cypher query?, and somewhat related to the answer in Find all relationship disjoint longest paths in cypher/traversal API ordered by size except my condition is a bit different:

I would like to write a cypher query that returns paths whose collections of nodes are "unique", in the sense that no two nodes in the path share the same name property.

Here's an example graph: enter image description here

and the cypher to make it:

CREATE (a1:Node {name: 'A', time:1}),
(c1:Node {name: 'C', time:2}),
(b1:Node {name: 'B', time:3}),
(d1:Node {name: 'D', time:4}),
(c2:Node {name: 'C', time:5}),
(a2:Node {name: 'A', time:6}),
(a3:Node {name: 'A', time:7}),
(b2:Node {name: 'B', time:8}),
(d2:Node {name: 'D', time:9})

CREATE (a1)-[:NEXT]->(b1)-[:NEXT]->(c2)-[:NEXT]->(a2)-[:NEXT]->(b2),
(a2)-[:NEXT]->(a3)-[:NEXT]->(d2),
(c1)-[:NEXT]->(b1),
(d1)-[:NEXT]->(c2)

RETURN *

In this graph, the following paths are considered longest, and unique:

(a1)-->(b1)-->(c2)
(c1)-->(b1)
(d1)-->(c2)-->(a2)-->(b2)
(a3)-->(d2)

Some example of paths that should not be returned from the query are

  1. (c1)-->(b1)-->(c2) since it contains two instances of nodes with name, "C"
  2. (a2)-->(b2) since it is contained in the larger path (d1)-->(c2)-->(a2)-->(b2)

Any help would be much appreciated.

Community
  • 1
  • 1
StevieP
  • 1,569
  • 12
  • 23

1 Answers1

4
//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH path, nodes(path) as nodes, 
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path to the collection
WITH collect({f: id(HEAD(nodes)), // Fist node in path
              l: id(LAST(nodes)), // Last node in path
              ns: EXTRACT(n in nodes(path) | id(n)), 
              p: path
     }) + {f: NULL, l: NULL, ns: [NULL]} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  UNWIND RANGE(i+1,size(paths)-1) as j
    WITH i, paths, paths[i]['p'] as path, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['f']) )) as fc, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['l']) )) as fl
    WHERE fl=0 OR fc=0
//
// Return all appropriate ways
RETURN path ORDER BY length(path)

Upd: (let's try to add elegance)

//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH a, b, path,
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path and first and last nodes to the collection
WITH collect({first: a, last: b, path: path}) + {} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  WITH i, paths[i]['path'] as path, paths
    UNWIND RANGE(i+1,size(paths)-1) as j
      WITH path,
           collect(distinct paths[i]['first'] IN nodes(paths[j]['path'])) as cFirst,
           collect(distinct paths[i]['last']  IN nodes(paths[j]['path'])) as cLast
        WHERE (size(cFirst)=1 AND cFirst[0] = false) OR
              (size(cLast )=1 AND cLast [0] = false)
//
// Return all appropriate ways
RETURN path ORDER BY length(path)
stdob--
  • 28,222
  • 5
  • 58
  • 73
  • So, this seems to work (result when run on my sample graph: http://pastebin.com/jxeKkr1f): I'm just wondering if 1) this is "idiomatic cypher", and 2) if there's a way to make that massive `WITH collect(...) as paths` statement more readable (or elegant)? Also, why is the `collect(DISTINCT id(node)) as uids` necessary? – StevieP May 07 '16 at 22:08
  • 1
    @StevieP 1) I think that is quite typical, although check nesting paths in each other, I would have already suffered at the application side. 2) See the update - it became better, imho :)) 3) `collect(DISTINCT id(node)) as uids` - need to check that the path does not intersect with itself - if you are sure that it can not be - take away this part. – stdob-- May 07 '16 at 22:45
  • You can definitely assume that the graph is a DAG, so self-intersection shouldn't be a problem (the relations are time-based). Regarding the application layer: if you could perform this task in python (for example) what would you leave to Cypher vs. Python? – StevieP May 08 '16 at 09:56
  • 1
    If DAG - can safely remove this part. At the application layer, I would have moved the second part (two-cycle UNWINW) - testing for sub-path. – stdob-- May 08 '16 at 10:13
  • so, use Cypher for fast path retrieval and coarse filtering and then do the more imperative stuff in app? Also, what is UNWINW? (Sorry for the newbie questions!) – StevieP May 08 '16 at 10:16
  • 1
    Yes. `UNWINW` - it is `UNWIND`. My typo, sorry :) – stdob-- May 08 '16 at 10:20