While looking for pagination with Solr and ElasticSearch, it turned out, both have the same "problem" (deep pagination, especially with shards). Though both search engines provide a solution/workaround for that:
Solr:
cursor
https://cwiki.apache.org/confluence/display/solr/Pagination+of+ResultsElasticSearch:
scroll
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-search-context
Now I read those pages and searched the internet, but I'm still a bit clueless at some points:
cursor
/scroll
timeouts (garbage collection):- Solr documentations doesn't seem to provide a way for setting a timeout (or some special query to invalidate a
cursor
token). That's basically just a question about possible memory leaks, etc. - ElasticSearch provides a timeout setting via
scroll=1m
.
- Solr documentations doesn't seem to provide a way for setting a timeout (or some special query to invalidate a
backwards pagination:
- Solr will provide a
cursor
token for each request, so it is possible to access any previous page. - ElasticSearch seems to use always the same
scroll
token. So I cannot go backwards without doing a new search?
- Solr will provide a
Alter search query:
- ElasticSearch explicitly requires to use a special URL for
scroll
queries (http://localhost:9200/_search/scroll?scroll=1m?scroll_id=...
). So there's no possibility to alter the search query. - Solr appends the
cursor
token to the normal query. Does this mean, that I can use somecursor
token and change the query (filters, ordering, page size, etc.)?
- ElasticSearch explicitly requires to use a special URL for
Index changes while using
scroll
/cursor
:Solr documentation says, that if the sort value of document 1 changed so that it is after the cursor position, the document is returned to the client twice. That's clear to me. But now there are two more questions, which don't get covered:
- What happens if I use the
cursor
token for page 2 (where document 1 was before the sort value change)? Will I see the old items (including document 1) or will I see a new generated page with freshly calculated documents? - Basically the same question as before: Solr documentation says: the sort value of document 17 changed so that it is before the cursor position, the document has been "skipped" and will not be returned to the client as the cursor continues to progress. If I use an old
cursor
token, will I be able to retrieve document 17? Or is it gone forever when using the currentcursor
token sequence?
- What happens if I use the
ElasticSearch documentation says nothing about what happens if the index changes while using
scroll
. I could imagine that it behaves the same as Solr, because both use Lucene for that functionality. But I'm completely unsure, because there's no information about that scenario.
How can this be faster than simple
size=10&from=10
/rows=5&start=0
?
More kinda technical question, just because I'd like to understand what happens under the hood.- I just wondered how (especially) Solr can do this
cursor
thing more efficient than normal pagination usingstart
androws
. Reason: (as said above) If a document changes, it will get reindex and can be placed after/before the currentcursor
. That sounds to me, like it has to reorder all documents. And that's basically the same as the default pagination!?
- I just wondered how (especially) Solr can do this
EDIT:
- ElasticSearch documentation says "A scrolled search takes a snapshot in time — it doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old datafiles around, so that it can preserve its “view” on what the index looked like at the time it started." So there's still the question: How does Solr handle this?
Would be cool, if someone could give me some explanation how things work.
Thanks in advance! :)