I'm trying to accomplish a pretty common task. I have a substantial dataset in a Neo4J database and, from a RESTful web service, i want to return the data in chunks of 25 nodes. My model is quite simple:
(:Tenant {Hash:''})-[:owns]->(:Asset {Hash:'', Name:''})
I have unique constraints on the Hash
properties on both labels.
If i wanted to obtain the 101th data page, my cypher query would look like this:
MATCH (:Tenant {Hash:'foo'})-[:owns]->(a:Asset)
RETURN a
ORDER BY a.Hash
SKIP 2500
LIMIT 25
My dataset consists of a single tenant, with ~75K assets. The above query takes ~30(!) seconds to complete. I also notice is that the further i advance in the data (ie. higher SKIP
) the longer it takes for the query to return.
I quickly figured out that the culprit of my performance issues is the ORDER BY a.Hash
. When i remove it, the query returns with sub-second results. This is actually quite a surprise, as i'd expect the index itself to also be ordered.
Obviously, in order to implement sensible pagination, i must have a consistent sort order.
- Any tips on making this query perform?
- Alternative suggestions for paging? I can see adding dedicated page nodes, but that will become difficult to maintain.
- What is the default sort order anyway, and is it consistent?