0

I try to use a Neo4J graph database in my project, and I'll try to explain you my problem.

I would like to have the longest path, within the limit of 8 nodes, on right and left of each result. But I don't know the last node of each end of my graph

The following diagram is a basic example. My graph is built like a chain, like this :

My DB - Neo4j diagram

My problem is to find the left and right nodes. With this dummy query, I have duplicate results

MATCH p=((nl)<-[:PREV*0..8]-(i)-[:NEXT*0..8]->(nr)) RETURN nodes(p);

This returns too much duplicate results. Here some samples of results :

i
h | i
g | h | i
...
i | j
i | j | k
...
h | i | j
h | i | j | k
...
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q

The last result is the only one that interests me.

It seems that Neo4j returns all possible combinations of nodes to the left and to the right within the limit of 8.

Additional information:

  • There can be several "middle nodes" ('i' in the example)
  • I want 8 nodes to the left, 8 to the right or less, but always the max number of nodes on both sides

Is it possible to perform this with Cypher?

MasOOd.KamYab
  • 944
  • 11
  • 25
roundge
  • 66
  • 9

1 Answers1

0

If you only need one, then order by the length of the path and limit to a single result:

MATCH p=((nl)<-[:PREV*0..8]-(i)-[:NEXT*0..8]->(nr)) 
WITH p
ORDER BY length(p) DESC
LIMIT 1
RETURN nodes(p);

Alternately you can match and gather on each side:

MATCH p=(nl)<-[:PREV*0..8]-(i)
WITH i, nl
ORDER BY length(p) DESC
WITH i, collect(nl) as nodes
MATCH p = (i)-[:NEXT*1..8]->(nr)
WITH nodes, i, nr
ORDER BY length(p) ASC
WITH i, nodes + collect(nr) as nodes
RETURN nodes;
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • Thanks InverseFalcon. Your first query works fine for single result, but I need more than one result, because I have often more than 1 "i" node. The 2nd query give me a lot of nodes for each row, and I don't understand why. And nodes are sorted in a strange way. – roundge Jul 16 '18 at 11:13
  • Then try out the alternate query instead. – InverseFalcon Jul 16 '18 at 11:15
  • Sorry my first comment was sent too early. I edited it. – roundge Jul 16 '18 at 11:17
  • The thing missing from my above query is the binding of the `i` node, so right now every node can be bound to `i`. If you match to the `i` node first, then append the rest of this query using the same `i` variable, do you still get a strange result? – InverseFalcon Jul 16 '18 at 11:20
  • Same thing with matching i node first. It returns 200 nodes per row, and 50 rows. But that's right I have 50 "i nodes" in my database! Sorry I'm a noob with neo4j and graph databases. I will persevere on the second query – roundge Jul 16 '18 at 11:29
  • 50 rows makes sense given that you haven't constrained `i` any further, but 200 nodes per row doesn't seem right. You should be seeing 17 nodes per row at most (max of 8 in each direction and the originating `i` node in the middle). Are you sure your model and data is correct? – InverseFalcon Jul 16 '18 at 11:35
  • Ho! I found the problem, it was the relationships. By specifying properties of links, I have a maximum of 17 nodes. It seems to be a working answer! I check the results and mark the answer as useful. Thanks a lot @InverseFalcon! – roundge Jul 16 '18 at 11:37
  • I would like to take advantage of your experience and your presence to ask you for additional information. If I want all (i) nodes followed by (j) nodes. I wrote this query, but it always returns 17 nodes, and (j) node doesn't appears in results. `MATCH (i {name:"i"}), (j {name:"j"}) WITH i, j MATCH p=(nl)<-[:PREV*0..8]-(i)<-[:PREV]-(j) WITH i, j, nl ORDER BY length(p) DESC WITH i, j, collect(nl) as nodes MATCH p = (j)-[:NEXT]->(i)-[:NEXT*1..8]->(nr) WITH nodes, i, j, nr ORDER BY length(p) ASC WITH i, j, nodes + collect(nr) as nodes RETURN nodes;` What is going wrong ? – roundge Jul 16 '18 at 12:01
  • Previous query works fine but (j) node doesn't appears in results. – roundge Jul 16 '18 at 12:07
  • None of the nodes you've collected include `j`, they're not in the paths you've specified. You would need to add it to the list after you collect it: `WITH i, j, collect(nl) + j as nodes` What's odd though is you've specified a pattern where you have nodes leading up to `i`, then a :NEXT relationship to `j`, then a :NEXT relationship to the same `i`, then the remaining nodes. Is this correct? – InverseFalcon Jul 16 '18 at 12:12
  • Wow @InverseFalcon, this is correct! I added (j) to collected nodes and it works perfectly. I better understand now. However I don't understand what's odd in my query, because (i) and (j) must always be chained ( (i)->(j) ou (j)<-(i) ). This query could be optimized ? – roundge Jul 16 '18 at 12:25
  • Glad you got it working. I'm just a bit confused by the structure. I'm assuming that for every :NEXT relationship in one direction, there's a :PREV relationship in the opposite direction. If so, then you would have `(i)-[:NEXT]->(j)-[:NEXT]->(i)` where `i` represents the same specific node. That also means you would also have `(i)<-[:PREV]-(j)<-[:PREV]-(i)`. Maybe that's all correct, I just wasn't expecting that structure. – InverseFalcon Jul 16 '18 at 12:39
  • Thanks @InverseFalcon :) My DB is built like a "chain", so `(i)-[:NEXT]->(j)-[:NEXT]->(i)` must not exist. However `(i)-[:NEXT]->(j)-[:PREV]->(i)` can exist. The only thing I do not understand is why (i) doesn't need to be added to returned 'nodes'. It seems (i) is already included into 'nodes' variable. This is awesome ! – roundge Jul 16 '18 at 13:14
  • Still confused. Your query has `MATCH p=(nl)<-[:PREV*0..8]-(i)<-[:PREV]-(j)`, so we can infer that since `(i)<-[:PREV]-(j)`, then `(i)-[:NEXT]->(j)`. But further down you have a MATCH with `(j)-[:NEXT]->(i)`, so either something is wrong with the structure, or something is wrong with the query, or I'm still misunderstanding something. – InverseFalcon Jul 16 '18 at 13:18
  • As for why (i) is already included, that's because of the lower bound of 0 in this snippet to get the previous nodes: `(nl)<-[:PREV*0..8]-(i)`. A lower bound of 0 includes the node itself, so (i) will be included in the nodes bound to `nl`, and subsequently collected. – InverseFalcon Jul 16 '18 at 13:20