Neo4J MATCH queries - differences between multiple approaches

Question

What's the difference in the results returned for the following queries:

1) MATCH (user)-[:hometown]->(city) MATCH (user)-[:speaks]->(language) RETURN user, city, language

2) MATCH (user)-[:hometown]->(city), (user)-[:speaks]->(language) RETURN user, city, language

3) MATCH (language)<-[:speaks]-(user)-[:hometown]->(city) RETURN user, city, language

4) MATCH (user)-[:hometown]->(city) WITH user,city MATCH (user)-[:speaks]->(language) RETURN user, city, language

If some of the queries return the same results, I would then like to know the query performance differences.

Try to use labels to allow the database to optimize better. – Michael Hunger Feb 07 '15 at 22:40 — Michael Hunger, Feb 07 '15 at 22:40

score 3 · Answer 1 · edited May 23 '17 at 12:05

You can use the PROFILE keyword to ask cypher how it intends to execute a given query. This way, you can pick out the differences and draw conclusions about which would be faster.

I'll do two of your queries to show you what I mean:

neo4j-sh (?)$ profile MATCH (user)-[:hometown]->(city) MATCH (user)-[:speaks]->(language) RETURN user, city, language;
+------------------------+
| user | city | language |
+------------------------+
+------------------------+
0 row

ColumnFilter
  |
  +SimplePatternMatcher
    |
    +TraversalMatcher

+----------------------+------+--------+-----------------------------+-----------------------------------+
|             Operator | Rows | DbHits |                 Identifiers |                             Other |
+----------------------+------+--------+-----------------------------+-----------------------------------+
|         ColumnFilter |    0 |      0 |                             | keep columns user, city, language |
| SimplePatternMatcher |    0 |      0 | user, language,   UNNAMED45 |                                   |
|     TraversalMatcher |    0 |      1 |                             |           city,   UNNAMED12, city |
+----------------------+------+--------+-----------------------------+-----------------------------------+

Here's your query #4 (with a slight adjustment, since your query #4 doesn't run as-is)

neo4j-sh (?)$ profile MATCH (user)-[:hometown]->(city) WITH user,city MATCH (user)-[:speaks]->(language) RETURN user, city, language;
+------------------------+
| user | city | language |
+------------------------+
+------------------------+
0 row

ColumnFilter(0)
  |
  +SimplePatternMatcher
    |
    +ColumnFilter(1)
      |
      +TraversalMatcher

+----------------------+------+--------+-----------------------------+-----------------------------------+
|             Operator | Rows | DbHits |                 Identifiers |                             Other |
+----------------------+------+--------+-----------------------------+-----------------------------------+
|      ColumnFilter(0) |    0 |      0 |                             | keep columns user, city, language |
| SimplePatternMatcher |    0 |      0 | user, language,   UNNAMED60 |                                   |
|      ColumnFilter(1) |    0 |      0 |                             |           keep columns user, city |
|     TraversalMatcher |    0 |      1 |                             |           city,   UNNAMED12, city |
+----------------------+------+--------+-----------------------------+-----------------------------------+

There are a lot of ways to compare these things, but in terms of general points to consider - DBhits (and other kinds of IO) are slow, so a query plan with smaller numbers there is better. The numbers look really small for me because I did this on an empty database, they're going to be different for you.

In general, you should push the most selective bits of the query to the beginning. The name of the game is in considering less data, and minimizing what neo4j has to traverse in order to find the answer.

Consider these two queries: they are mirror images of one another, and return the same thing. But one of them is quite selective immediately, and the other is overly broad:

Version 1:

match (user {id: 1})
WITH user
MATCH (user)-[:has]->(item)
RETURN item;

Version 2:

MATCH (item)
WITH item
MATCH (item)<-[:has]-(user)
WHERE user.id = 1
RETURN item;

I believe in general, version 1 is going to be better.

but not without labels and an index to find your start user really fast. — Michael Hunger, Feb 07 '15 at 22:41

Neo4J MATCH queries - differences between multiple approaches

1 Answers1