Cypher query become very slow on a medium size dataset (with loop)

Question

This question further extends the idea on the question: Cypher: how to find all the chains of single nodes not repeated?

For example, in a graph like this:

(a1:TestNode)-[:REL]->(r1:Route)-[:REL]->(a2:TestNode)-[:REL]->(s1:Route)-[:REL]->(a1:TestNode)
(a2:TestNode)-[:REL]->(r2:Route)-[:REL]->(a3:TestNode)-[:REL]->(s2:Route)-[:REL]->(a2:TestNode)
(a3:TestNode)-[:REL]->(r3:Route)-[:REL]->(a4:TestNode)-[:REL]->(s3:Route)-[:REL]->(a3:TestNode)

Graphically:

                     s3 ← a4
                   ↙    ↗
            s2 ← a3 → r3
          ↙    ↗
   s1 → a2 → r2
 ↙    ↗
a1 → r1

Cypher code:

CREATE (a1:TestNode {name:'a1'})-[:REL]->(r1:Route {name:'r1'})-[:REL]->(a2:TestNode {name:'a2'})-[:REL]->(s1:Route {name:'s1'})-[:REL]->(a1),
(a2)-[:REL]->(r2:Route {name:'r2'})-[:REL]->(a3:TestNode {name:'a3'})-[:REL]->(s2:Route {name:'s2'})-[:REL]->(a2),
(a3)-[:REL]->(r3:Route {name:'r3'})-[:REL]->(a4:TestNode {name:'a4'})-[:REL]->(s3:Route {name:'s3'})-[:REL]->(a3)

Afterwards, we can find a route from a4 to a1 by this command:

MATCH p = (a4:TestNode {name: 'a1'})-[r:REL*]->(a1:TestNode {name: 'a4'})
WITH [a4] + nodes(p) AS ns, p
    WHERE ALL (n IN ns 
        WHERE 1=SIZE(FILTER(m IN TAIL(ns) 
                            WHERE m = n)))
RETURN p

Question: 1. If I extend the above create query to have 2,000 'a' nodes, i.e. up to

(a2000)-[:REL]->(r2000:Route {name:'r2000'})-[:REL]->(a2001:TestNode {name:'a2001'})-[:REL]->(s2000:Route {name:'s2000'})-[:REL]->(a2000),

I found that my computer becomes very slow, and 2GB of memory is occupied by neo4j. Is it normal?

Then I want to find a route from a2001 to a1. The system cannot find the solution (which is obvious a2001->a2000->a1999.....->a1). I guess it is because of the loops in between. In the previous question mentioned above, the query should have avoided loops because duplicates are not allowed.

My purpose is to extend this idea such that possible routes between 2 locations can be identified on a connected graph. Many thanks.

I have 2 other interesting observation: 1. If I want to find a route from a1 to a3000 (where a3000 does not exist). The system CANNOT say there is no row in a reasonable amount of time. It is actually very strange, given a simple sort for node indicates a3000 does not exist. 2. I tried to find a route from a1 to a4, which involves only a few relationships. However, my pc takes 2 minutes to find the solution. (Any hint why it is so slow?) — chinghp, Jun 15 '16 at 15:53
and use an index hint for both nodes, can you share the query plans? — Michael Hunger, Jun 15 '16 at 22:48
not sure why you switch identifiers a1, a4 and their variables, that's confusing — Michael Hunger, Jun 15 '16 at 23:01

Cypher query become very slow on a medium size dataset (with loop)

0 Answers0