1

I want get the data from my elasticsearch node for my code, i am using elasticsearch-dsl library to query the data from elasticsearch. Now i want the data to be sorted according to the "@timestamp" which can done using sort api. But the data that i am getting back has more than 10000 documents. I cannot use scan with sort to get large data as with sort doesn't work with scan in elasticsearch-dsl. Is there a way to use scroll api in elasticsearch-dsl or any other way to get more than 10000 document sorted with "@timestamp".

S.Kumar
  • 51
  • 8

1 Answers1

2

scroll does work with sort, you just need to call it with preserve_order: s.params(preserve_order=True).scan()

Hope this helps!

Honza Král
  • 2,982
  • 14
  • 11
  • Its showing this error :- "ScanError: Scroll request has failed on 30 shards out of 32" when i am using the above setting – S.Kumar Jul 20 '18 at 09:33
  • What is the error that you are getting? Catch the exception and print its `.info` property – Honza Král Jul 20 '18 at 19:41
  • "error:Scroll request has failed on 38 shards out of 41" this is error i am getting. – S.Kumar Jul 21 '18 at 20:21
  • That's just the message, please catch the exception and print out its `.info` property. This is just telling you what went wrong, not why, it is of no help – Honza Král Jul 22 '18 at 19:56
  • this is the traceback:- Traceback (most recent call last): File "check_dsl.py", line 41, in run_query for hit in response: File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/search.py", line 701, in scan **self._params File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 316, in scan (resp['_shards']['failed'], resp['_shards']['total']) ScanError: Scroll request has failed on 41 shards out of 44. – S.Kumar Jul 22 '18 at 20:54
  • s = Search().query(q).sort("@timestamp") try: response = s.params(preserve_order=True).scan() – S.Kumar Jul 22 '18 at 21:11
  • i am using the above code, now after this if i am doing:- "print response" o/p is coming:- "" and if i am trying to iterate over the response i am getting the above error. – S.Kumar Jul 22 '18 at 21:13
  • The traceback is not needed, just the `info` property: `try: scan(...) except Exception as e: print(e.info)` – Honza Král Jul 25 '18 at 02:28
  • i tried this, it says 'ScanError' object has no attribute 'info' – S.Kumar Jul 25 '18 at 09:30
  • fwiw, I came across this because my results were not sorted when using `scan()`. Adding `.params(preserve_order=True)` to the query fixed it. So I assume that this answer can be accepted, and that the shard-failures mentioned in the comments are unrelated? – exhuma Jun 26 '20 at 11:21