1

I have a python script that should print all the ID's of people in my JSON files stored in elasticsearch. But I only get ten results(truncated), as I know that by default only 10 results are shown.

from elasticsearch import Elasticsearch
import sys
es = Elasticsearch()
res = es.search(index="my_docs", body={"query": {"match_all": {}}})
print("%d documents found" % res['hits']['total'])
for doc in res['hits']['hits']:
        print (" Doc ID: %s" % (doc['_id']))

It says 5000 Documents found but returns 10 ID's only.

What is the way to get all documents' Doc ID's printed from my collection in Elasticsearch?

Alfe
  • 56,346
  • 20
  • 107
  • 159
Cyber_Tron
  • 299
  • 1
  • 6
  • 17
  • The title of your question is misleading. Surely, this is not a print error, but a data retrieval error. – DYZ Jun 17 '17 at 01:06

2 Answers2

1

You need to tell ES to return more than ten results (which is the default):

body={"query": {"match_all": {}}, 'results': 1000}

For very large amounts of results you need to get all results in a paging manner; ES provides means to do this.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • Thanks! And suppose I want to print all the id's , can I use something like 'results': 'all'? – Cyber_Tron Jun 17 '17 at 01:53
  • You can use sth like `'result': 10000000000000` but of course this will lead to memory and transmission time problems at some point. Because of this, there is no "all". And to gather all results in one chunk is probably way more costly than using the mentioned paging variant in large cases. 5000 elements, though, should not pose a large problem yet. So your case may be solved if you give 10000 as a results limit. – Alfe Jun 17 '17 at 09:05
  • 3
    try `size` in case you are getting this error `RequestError: RequestError(400, 'parsing_exception', 'Unknown key for a VALUE_NUMBER in [results].')`. I am not sure if the error is due elasticsearch version, or because I was doing an aggregation and not a simple query. – toto_tico Apr 12 '19 at 16:47
  • 3
    The correct key is `'size' : 1000` for elasticsearch (at least 7.6.0 and onwards) – enrm Feb 18 '20 at 14:02
  • 1
    @enrm Thanks for pointing that out. It is a major nuisance of ElasticSearch that they changed their interfaces that much. Many of the older answers are outdated due to this :-( – Alfe Feb 19 '20 at 21:07
0
  1. Use Scroll api if Number of documents exceeds 10000.
  2. Use Search api with limit to get specified count.