(A) Your first MATCH
finds every actor that is connected within 10 hops to Kevin Bacon. The result of this clause is a number (M) of rows (and if an actor is connected in, say, 7 different ways to Kevin, then that actor is represented in 7 rows).
(B) Your second MATCH
finds every actor that has acted in a movie. If this MATCH
clause were standalone, then it would require N rows, where N is the number of ACTS_IN
relationships (and if an actor acted in, say, 9 movies, then that actor would be represented in 9 rows). However, since the clause comes right after another MATCH
clause, you get a cartesian product and the actual number of result rows is M*N.
So, your query requires a lot of storage and performs a (potentially large) number of redundant comparisons, and your results can contain duplicate names. To reduce the storage requirements and the number of actor comparisons (in your WHERE
clause): you should cause the results of A and B to have distinct actors, and eliminate the cartesian product.
The following query should do that. It first collects a single list (in a single row) of every distinct actor that is connected within 10 hops to Kevin Bacon (as hs
), and then finds all (distinct) actors not in that collection:
MATCH (kb:Actor {name:'Kevin Bacon'})-[*..10]-(h:Actor)
WITH COLLECT(DISTINCT h) AS hs
MATCH (a:Actor)
WHERE NOT a IN hs
RETURN a.name;
(This query also saves even more time by not bothering to test whether an actor has acted in a movie.)
The performance would still depend on how long it takes to perform the variable length path search in the first MATCH
, however.