I want get the data from my elasticsearch node for my code, i am using elasticsearch-dsl library to query the data from elasticsearch. Now i want the data to be sorted according to the "@timestamp" which can done using sort api. But the data that i am getting back has more than 10000 documents. I cannot use scan with sort to get large data as with sort doesn't work with scan in elasticsearch-dsl. Is there a way to use scroll api in elasticsearch-dsl or any other way to get more than 10000 document sorted with "@timestamp".
Asked
Active
Viewed 1,377 times
1 Answers
2
scroll
does work with sort
, you just need to call it with preserve_order
: s.params(preserve_order=True).scan()
Hope this helps!

Honza Král
- 2,982
- 14
- 11
-
Its showing this error :- "ScanError: Scroll request has failed on 30 shards out of 32" when i am using the above setting – S.Kumar Jul 20 '18 at 09:33
-
What is the error that you are getting? Catch the exception and print its `.info` property – Honza Král Jul 20 '18 at 19:41
-
"error:Scroll request has failed on 38 shards out of 41" this is error i am getting. – S.Kumar Jul 21 '18 at 20:21
-
That's just the message, please catch the exception and print out its `.info` property. This is just telling you what went wrong, not why, it is of no help – Honza Král Jul 22 '18 at 19:56
-
this is the traceback:- Traceback (most recent call last): File "check_dsl.py", line 41, in run_query for hit in response: File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/search.py", line 701, in scan **self._params File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 316, in scan (resp['_shards']['failed'], resp['_shards']['total']) ScanError: Scroll request has failed on 41 shards out of 44. – S.Kumar Jul 22 '18 at 20:54
-
s = Search().query(q).sort("@timestamp") try: response = s.params(preserve_order=True).scan() – S.Kumar Jul 22 '18 at 21:11
-
i am using the above code, now after this if i am doing:- "print response" o/p is coming:- "
" and if i am trying to iterate over the response i am getting the above error. – S.Kumar Jul 22 '18 at 21:13 -
The traceback is not needed, just the `info` property: `try: scan(...) except Exception as e: print(e.info)` – Honza Král Jul 25 '18 at 02:28
-
i tried this, it says 'ScanError' object has no attribute 'info' – S.Kumar Jul 25 '18 at 09:30
-
fwiw, I came across this because my results were not sorted when using `scan()`. Adding `.params(preserve_order=True)` to the query fixed it. So I assume that this answer can be accepted, and that the shard-failures mentioned in the comments are unrelated? – exhuma Jun 26 '20 at 11:21