1

I am getting same record on different pages when implementing pagination using group by.

I am using the query mentioned below:

http://<hostname>:<port>/search/?yql=select * from sources document_name where sddocname contains 'document_name' | all(group(key) max(2) each(each(output(summary()))));
Yash Kasat
  • 203
  • 1
  • 5

2 Answers2

2

Are you looking at the grouping results or the normal hits structure? Please note that the grouping expression will not in any way affect the normal hits returned.

You will probably want to add LIMIT 0 / hits=0 and only look at the results from the grouping expression.

You also need a (stable) ordering of the hits for pagination by continuations to work well. This is usually the case as in most use cases there will be a ranking expression in place.

The default ordering in grouping expressions is by rank - in grouping expression syntax this would be order(max(relevance())).

The query above only limits on document type. All documents of that document type will match this query equally well. I tested this using the "album-recommendation-selfhosted" sample app, and relevance was 0 for all documents. When the relevance is the same for all documents, the order will essentially be random. The same thing may occur when doing e.g. order(-count()) if count() is the same for several groups.

I was able to achieve the expected results by adding and using a ranking profile using the random.match rank feature: https://docs.vespa.ai/documentation/reference/rank-features.html#random I believe this should ensure a stable ordering of hits, although this may still produce different results if the query is dispatched to different (groups of) content hosts. If you need a stable global ordering, consider storing a random float/double to each document to rank/order by - this can also be used as a "tie breaker" to help ensure a stable order from ranking expressions.

andreer
  • 331
  • 1
  • 5
  • The above issue persists even when I add the limit = 0 before pipeline. – Yash Kasat Mar 03 '20 at 09:21
  • You will also need to use the continuation tokens from the grouping output to paginate: https://docs.vespa.ai/documentation/grouping.html#pagination – andreer Mar 03 '20 at 09:28
  • The query that is mentioned above is used to get result for the first page, for rest of the pages I had used [{ 'continuations':['this_value','next_value'] }] that I get from previous page. But still I am getting records on multiple pages. – Yash Kasat Mar 03 '20 at 09:45
  • Can you share the search definition entry for the "key" field? – andreer Mar 04 '20 at 14:48
  • field key type string { indexing: summary | attribute } – Yash Kasat Mar 05 '20 at 09:46
  • I was not able to reproduce this - Please open an issue on https://github.com/vespa-engine/vespa/issues – andreer Mar 06 '20 at 13:05
0

I also encountered the problem with pagination while using the search endpoint. An alternative that worked very nicely for me is through using the selection parameter as described in the official documentation https://docs.vespa.ai/en/document-v1-api-guide.html

igli
  • 40
  • 1
  • 3