2

I have build a linked list model using neo4j. Here there is a representation:

Linked list of events

A user has a list of Events and each one has two attributes: date and done. Given a particular time, I would like to set all previous events' done attribute to true.

My current query is this one:

MATCH (user:User {id: {myId} })-[rel:PREV*]->(event:Event {done:false}) 
WHERE event.date <= {eventTime}
SET event.done = true;

This query takes 12 sec when the list has 500 events and I would like to make it faster. One possibility would be to stop the query once it finds an event which is already done, but I don't know how to do it.

khaled_gomaa
  • 3,382
  • 21
  • 24
marc_aragones
  • 4,344
  • 4
  • 26
  • 38
  • "takes too long when the list is really big" and "make it faster" are fairly meaningless here, as we have no idea what "really big" means: Hundreds? Thousands? How long is the query taking? How long do you expect it to take? – David Makogon Dec 23 '15 at 13:47
  • @DavidMakogon For example, it takes 12 seconds to perform the query if the list has 500 events. It seams too much because the majority of the events are done (less than 10 have done=true). – marc_aragones Dec 23 '15 at 14:26
  • Is there a reason you aren't using `shortestPath`? – Nicole White Dec 23 '15 at 15:38
  • @NicoleWhite - Unless I completely misunderstand the OP's question, ShortestPath is not going to solve the problem. The intent appears to deal with a bulk number of nodes where `done = false` and set them to `true`. `ShortestPath` would serve only to find the most recent not-done event, which doesn't appear to be needed. – David Makogon Dec 23 '15 at 16:37
  • The shortest path function would find the shortest path to each Event node where done=false, not just the first one. – Nicole White Dec 23 '15 at 16:55

2 Answers2

1

Your question is fairly vague with performance issues and targets, but one critical thing for performance is creating an index on properties you are examining. In your case, that would mean creating an index on both the done property and the date property:

CREATE INDEX ON :Event(done)
CREATE INDEX ON :Event(date)

Additionally, your query retrieves all events in the entire history of a user, as seen in:

-[rel:PREV*]->

You could cap the depth, such as

-[rel:PREV*..20]->

to prevent complete traversal. That might not give you the outcome you're looking for, but it would prevent long-running queries if you have an extreme number of nodes in your linked list (you haven't specified how large that list could get, so I have no idea if this will actually help).

David Makogon
  • 69,407
  • 21
  • 141
  • 189
1

You can use shortestPath for this and it will be much faster. In general, you should never use [:REL_TYPE*] because it does an exhaustive search for every path of any length between the nodes.

I created your data:

CREATE (:User {id:1})-[:PREV]->(:Event {id:1, date:1450806880004, done:false})-[:PREV]->(:Event {id:2, date:1450806880003, done:false})-[:PREV]->(:Event {id:3, date:1450806880002, done:true})-[:PREV]->(:Event {id:4, date:1450806880002, done:true});

Then, the following query will find all previous Event nodes in a particular User's linked list where done=false and the date is less than or equal to, say, 1450806880005.

MATCH p = shortestPath((u:User)-[:PREV*]->(e:Event))
WHERE u.id = 1 AND 
      e.done = FALSE AND 
      e.date <= 1450806880005
RETURN p;

This yields:

p
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:false, id:1})]
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:false, id:1}), (7)-[7:PREV]->(8), (8:Event {date:1450806880003, done:false, id:2})]

So you can see it's returning two paths, one that terminates at Event with id=1 and another that terminates at Event with id=2.

Then you can do something like this:

MATCH p = shortestPath((u:User)-[:PREV*]->(e:Event))
WHERE u.id = 1 AND e.done = FALSE AND e.date <= 1450806880005
FOREACH (event IN TAIL(NODES(p)) | SET event.done = TRUE)
RETURN p;

I'm using TAIL here because it grabs all the nodes except for the first one (since we don't want to update this property for the User node). Now all of the done properties have been updated on the Event nodes:

p
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:true, id:1})]
[(6:User {id:1}), (6)-[6:PREV]->(7), (7:Event {date:1450806880004, done:true, id:1}), (7)-[7:PREV]->(8), (8:Event {date:1450806880003, done:true, id:2})]

EDIT: And don't forget the super fun bug where the shortestPath function silently sets the maximum hop limit to 15 in Neo4j < 2.3.0. See

ShortestPath doesn't find any path without max hops limit

Find all events between 2 dates

So if you're on Neo4j < 2.3.0, you'll want to do:

MATCH p = shortestPath((u:User)-[:PREV*..1000000000]->(e:Event))
WHERE u.id = 1 AND e.done = FALSE AND e.date <= 1450806880005
FOREACH (event IN TAIL(NODES(p)) | SET event.done = TRUE)
RETURN p;
Community
  • 1
  • 1
Nicole White
  • 7,720
  • 29
  • 31
  • I have tested your query in different lists but it always returns (and changes) the first 15 Events. – marc_aragones Dec 23 '15 at 22:41
  • 1
    What version of Neo4j are you on? I believe that was fixed in Neo4j 2.3.0. See http://stackoverflow.com/a/32686764/2848578 and http://stackoverflow.com/a/30162759/2848578. The solution is to set an arbitrarily high maximum within the shortestPath function, something like `[PREV*..1000000000]`. – Nicole White Dec 23 '15 at 22:43