Elasticsearch is being really slow when returning all results

Question

I have an index of ~113000 documents. I'm trying to retrieve all of them, and I don't care about the score. basically a select * from index;

And i'm doing this in python using elasticutils (haven't found the time to switch to elasticsearch-dsl yet)

Running

S().indexes('da_userstats').query().count()

completes in about 0.003 seconds.

Running

S().indexes('da_userstats').query()[0:113595].execute().objects

is taking about 15 seconds.

From what I understand of the documentation both should forcing execution, so I don't see why there is the huge difference in time.

In the mapping I've tried marking the fields as don't analyze but its had no effect. I really don't get why there is a difference of so many orders of magnitude.

@classmethod
def get_mapping(cls):
    return {
        'properties': {
            'id': {
                'type': 'integer',
                'index': 'not_analyzed',
                "include_in_all": False,
            },
            'email': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'username': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'date_joined': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity_web': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },
            'last_activity_ios': {
                'type': 'string',
                'index': 'not_analyzed',
                "include_in_all": False
            },

This way of returning all documents is not the Elasticsearch way. Use [`size`/`from` or pagination](https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html) to do this. If you would have had more documents or smaller heap size you would have run out of memory doing it like this. — Andrei Stefan, Aug 04 '15 at 14:43
If you want to retrieve all the documents, and you don't care about the order, you may want to use the scroll and scan API, which is very fast. — MauricioRoman, Aug 05 '15 at 00:44
@AndreiStefan in elasticutils slicing is the way of specifying size and from, and looking at pagination, they advise against doing it when you want to go through all the documents. — jhulme, Aug 05 '15 at 09:19
@MauricioRoman Going to take a look at that thanks, sorting it afterwards in python might be faster as well because its not doing it on every shard and then once at the end — jhulme, Aug 05 '15 at 09:20

Elasticsearch is being really slow when returning all results

0 Answers0