2

Without getting unnecessarily too specific, I'm facing the following issue with Cyper in Neo4j 3.2. Let's say we have a database with 3 entities: User, Comment, Like.

For whatever reason, I'm trying to run the following query:

MATCH (n:USER) WHERE n.name = "name" 
WITH n 
MATCH (o:USER)
WITH n, o, "2000" as number
MATCH (n)<-[:CREATED_BY]-(:COMMENT)-[:HAS]->(l:LIKE)-[:CREATED_BY]->(o)
RETURN n, o, number, count(l)

The query takes minutes to complete. However, if I simply remove the "2000" as number part, it completes within tens of milliseconds.

Does anybody have an explanation why?

EDIT: Top image, with "2000" as number part; bottom, without it.

user3455402
  • 119
  • 1
  • 9
  • 1
    My assumption is that you / cypher creates 32969 new Strings. Are you hitting gc pauses in the JVM? Are you experiencing the same when using the number 2000? – manonthemat Jul 03 '17 at 23:51

1 Answers1

4

You're going to have to clean up your query, right now you're not using indexes (so the initial match with the specific name is slow), and then you perform a cartesian product against all :User nodes, then create strings for each row.

So first, create an index on :USER(name) so you can find your start node fast.

Then we'll have to clean up the rest of the match.

Try something like this instead:

MATCH (n:USER) WHERE n.name = "name" 
WITH n, "2000" as number
MATCH (n)<-[:CREATED_BY]-(:COMMENT)-[:HAS]->(l:LIKE)-[:CREATED_BY]->(o:User)
RETURN n, o, number, count(l)

You should see a similar plan with this query as in the query without the "2000".

The reason for this is that although your plan has a cartesian product with your match to o, the planner was intelligent enough to realize there was an additional restriction for o in that it had to occur in the pattern in your last match, and its optimization for that situation let you avoid performing a cartesian product.

Introduction of a new variable number, however, prevented the planner from recognizing that this was basically the same situation, so the planner did not optimize out the cartesian product.

For now, try to be explicit about how you want the query to be performed, and try to avoid cartesian products in your queries.

In this particular case, it's important to realize that when you have MATCH (o:User) on the third line, that's not declaring that the type of o is a :User in the later match, it's instead saying that for every row in your results so far, perform a cartesian product against all :User nodes, and then for each of those user nodes, see which ones exist in the pattern provided. That is a lot of unnecessary work, compared to simply expanding the pattern provided and getting whatever :User nodes you find at the other end of the pattern.

EDIT

As far as getting both :LIKE and :DISLIKE node counts, maybe try something like this:

MATCH (n:USER) WHERE n.name = "name" 
WITH n, "2000" as number
MATCH (n)<-[:CREATED_BY]-(:COMMENT)-[:HAS]->(likeDislike)-[:CREATED_BY]->(o:User)
WITH n, o, number, head(labels(likeDislike)) as type, count(likeDislike) as cnt
WITH n, o, number, CASE WHEN type = "LIKE" THEN cnt END as likeCount, CASE WHEN type = "DISLIKE" THEN cnt END as dislikeCount
RETURN n, o, number, sum(likeCount) as likeCount, sum(dislikeCount) as dislikeCount

Assuming you still need that number variable in there.

InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
  • thank you for your answer. This does clear up a few things. However, the reason for writtin the query like so, was that line 3 whould do more than simply matching all other users. For the sake of this example, let's imagine that there are also DISLIKE nodes in the database, although this would be a design flow. The query should count the number of DISLIKE nodes between n and every other o, then also count all LIKE nodes between n and the same every other o. – user3455402 Jul 04 '17 at 06:05
  • 2
    That's still not a good reason to perform a cartesian product here, the problem would only get worse if you had to check two types of nodes. There's no reason to check the pattern for every single :USER. Instead just check which users are found by the pattern itself. For your DISLIKE use case, it's probably easier to not label the potential :LIKE or :DISLIKE node, then use CASE to get a count for each. – InverseFalcon Jul 04 '17 at 09:45