0

Let's suppose this Cypher query (Neo4j):

MATCH(m:Meeting)
WHERE m.startDate > 1405591031731
RETURN m.name

In case of millions Meeting nodes in the graph, which strategy should I choose to make this kind of query fast?

  • Indexing the Meeting's startDate property?
  • Indexing it but with a LuceneTimeline?
  • Avoiding index and preferring such a structure?
    However, this structure seems to be relevant for querying by a range of dates (FROM => TO), not for just a From.

I haven't use cases when I would query a range: FROM this startDate TO this endDate.

By the way, it seems that simple indexes work only when dealing with equality... (not comparison like >).

Any advice?

Mik378
  • 21,881
  • 15
  • 82
  • 180

1 Answers1

1

Take a look at this answer: How to filter edges by time stamp in neo4j?

When selecting nodes using relational operators, it is best to select on an intermediate node that is used to group your meeting nodes into a discrete interval of time. When adding meetings to the database you would determine which interval each timestamp occurred within and get or create the intermediate node that represents that interval.

You could run the following query from the Neo4j shell on your millions of meeting nodes which would group together meetings into an interval of 10 seconds. Assuming your timestamp is milliseconds.

MATCH (meeting:Meeting)
MERGE (interval:Interval { timestamp: toInt(meeting.timestamp / 10000) }
MERGE (meeting)-[:ON]->(interval);

Then for your queries you could do:

MATCH (interval:Interval) WHERE interval.timestamp > 1405591031731
WITH interval
MATCH (interval)<-[:ON]-(meeting:Meeting)
RETURN meeting
Community
  • 1
  • 1
Kenny Bastani
  • 3,268
  • 15
  • 20
  • Thanks for your answer :) What if I had **only**, let's say, 100 meetings in my graph, would I still need to "group" those meetings initially by interval node? Or is it just an optimization when there are likely a lot of meetings created in the same interval of time (10 seconds for instance). – Mik378 Jul 20 '14 at 10:15
  • By the way, are you agree that this solution is different that the one you gave in the linked post "How to filter edge..."? Indeed, it doesn't use any linkedList to chain creation timestamps. – Mik378 Jul 20 '14 at 10:45