25

Is it possible to have cypher query paginated. For instance, a list of products, but I don't want to display/retrieve/cache all the results as i can have a lot of results.

I'm looking for something similar to the offset / limit in SQL.

Is cypher skip + limit + orderby a good option ? http://docs.neo4j.org/chunked/stable/query-skip.html

sunix
  • 323
  • 1
  • 3
  • 8

3 Answers3

25

SKIP and LIMIT combined is indeed the way to go. Using ORDER BY inevitably makes cypher scan every node that is relevant to your query. Same thing for using a WHERE clause. Performance should not be that bad though.

tstorms
  • 4,941
  • 1
  • 25
  • 47
  • 7
    One nitpick: using a WHERE clause won't necessarily force every node to be scanned. Cypher will still stop scanning after the first `LIMIT x` nodes are found that match the specified conditions (after all, why would it need to read anything else?). You are correct about `ORDER BY` causing full scans though. – ean5533 May 02 '13 at 16:49
  • 4
    I don't know about neo4j, but with most databases (and its odd that many engineers don't know this), it is pretty much mandatory that you use an `ORDER BY` clause when executing paginated queries: without `ORDER BY`, the database implementation is free to return results in whatever order it deems necessary (e.g. performance, whatever). This means that page 2's query results might not have any continuity with page 1's results whatsoever. @ean5533 do you know how this works in neo? – Les Hazlewood Jan 25 '14 at 21:12
  • 2
    @LesHazlewood You're right to be concerned. The answer depends on whether or not your traversal is predictable. The reason SQL databases could return randomly ordered results is because data pages get shuffled around for performance purposes. In neo4j's case the results will be affected by the order in which neo4j traverses the nodes, which may be predictable depending on your data and on the query. It's hard to give a general answer. – ean5533 Jan 25 '14 at 23:11
  • Without the ORDER BY clause, how do you guarantee that nodes are returned in subsequent queries? – F.O.O Jan 18 '15 at 23:55
  • With this approach you need to be sure your graph isn't changing while you are processing a particular batch of nodes. My understanding is that each code loop will re-run the query. Even with an order_by, if the complete list of nodes has changed since the last time you ran skip/limit, you may either end up skipping over some nodes, or processing some nodes more than once. – rotten Feb 23 '16 at 15:57
8

Its like normal sql, the syntax is as follow

match (user:USER_PROFILE)-[USAGE]->uUsage 
where HAS(uUsage.impressionsPerHour) AND (uUsage.impressionsPerHour > 100) 
ORDER BY user.hashID 
SKIP 10 
LIMIT 10; 

This syntax suit to last version (2.x)

Nadav Finish
  • 1,120
  • 9
  • 7
2

Neo4j apparently uses "indexed-backed order by" nowadays, which means if you are using alphabetical ORDERBY on indexed node properties within your SKIP/LIMIT query, then Neo4j will not perform a full scan of all "relevant nodes" as other have mentioned (their responses were long ago, so keep that in mind). The index will allow Neo4j to optimize on the basis that it already stores indexed properties in ORDERBY order (alphabetical), such that your pagination will be even faster than without the index.