3

I am trying to apply pagination to results by querying multiple times to get past the 10k barrier of Elasticsearch. Since the results of Elasticsearch can differ during multiple queries I want to use the generated ID to get the next results.

So for example, I run a query that returns 1000 results. Then I want to get the ID value of the 1000th result, and perform a query like : match : ID {{1000thID}}

This way I want to get the 1001 until 2000 result. after that 2001 until 3000, so on.

I currently use the Elasticsearch DSL for python to query on domain name like:

search.query('match', domainname=domainname)

How do I rebuild this code to match above requirements. ('match',_ID > ID_Variable)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
JasperFennet
  • 53
  • 10

1 Answers1

1

The best way to achieve what you want is to use the scroll/can API. However, if you still want to proceed that way, you can do it like this:

last_id = ...
search.filter('range', id={'gt': last_id + 1, 'lt': last_id + 1000})
Val
  • 207,596
  • 13
  • 358
  • 360
  • Damn, forgot to say that the Generated ID is alphanummeric like: "AVfcOQSECcVao75vrqGf" , therefore above solution wont work i guess! – JasperFennet Oct 19 '16 at 12:26
  • And quite an important one ;-) You can still use your own ids if you really want. – Val Oct 19 '16 at 12:26
  • where do i config that it autoincrement, Logstash or Elasticsearch? and any source on how to config this? i have looked in having your own id, but all i can find is manually added. – JasperFennet Oct 19 '16 at 12:29
  • Yes, you need to explicitly provide your own custom ids when indexing your document. ES won't do it for you. – Val Oct 19 '16 at 12:34