Searching through an alias with filter is very slow in Elasticsearch

Question

I have an elasticsearch index, my_index, with millions of documents, with key my_uuid. On top of that index I have several filtered aliases of the following form (showing only my_alias as retrieved by GET my_index/_alias/my_alias):

{
    "my_index": {
        "aliases": {
            "my_alias": {
                "filter": {
                    "terms": {
                        "my_uuid": [
                            "0944581b-9bf2-49e1-9bd0-4313d2398cf6",
                            "b6327e90-86f6-42eb-8fde-772397b8e926",
                            thousands of rows...
                        ]
                    }
                }
            }
        }
    }
}

My understanding is that the filter will be cached transparently for me, without having to do any configuration. The thing is I am experiencing very slow searches, when going through the alias, which suggests that 1. the filter is not cached, or 2. it is wrongly written.

Indicative numbers:

GET my_index/_search -> 50ms 
GET my_alias/_search -> 8000ms

I can provide further information on the cluster scale, and size of data if anyone considers this relevant.

I am using elasticsearch 2.4.1. I am getting the right results, it is just the performance that concerns me.

what happens when you run the search query directly and add the filter that is applied to the alias. does it take time? — pratikvasa, Feb 09 '17 at 13:52
Have you checked that `my_uuid` is `not_analyzed`? But thousands of terms on a filter seems quite heavy weight. If you know these uuids at index time you could add a new field `aliases` to each doc. Then your filter would just have a single term. — NikoNyrh, Feb 09 '17 at 14:59
@NikoNyrh `my_uuid` is `not_analyzed`. Indeed I know them at index time, but they are dynamically updated in bulk, so I did not want to hard code them into the searchable documents. — yannisf, Feb 09 '17 at 15:37
Hi @pratikvasa. I performed the test and got similar times. The thing is, that the query I have to send when not using the alias with the filter is around 4MB due to the number of the `my_uuid`s, and just uploading the query takes about 6 seconds. So I guess this is not considered a viable solution. — yannisf, Feb 10 '17 at 13:51
ohk..by similar times you mean you are getting around 8 secs which includes 6 seconds to send the query? — pratikvasa, Feb 10 '17 at 14:13
Filter caches are only returned after the 3rd hit to the same filter. If you run the same query to the alias multiple times, does the time taken go down? — Farid, May 12 '18 at 00:50
No, it doesn't. Official answer I got from the Elastic forum was that this is unlikely to improve anytime soon and using such filters is an anti-pattern. — yannisf, May 14 '18 at 18:52
@yannisf I know this is a really old thread, but when you say 'Official answer I got from Elastic forum', I'm wondering if it's possible to maybe add a link to it here for future readers. — Ayush, Feb 25 '23 at 10:09

bokan · Answer 1 · 2018-10-11T12:56:29.643

Matching each document with a 4MB list of uids is definetly not the way to go. Try to imagine how many CPU cycles it requires. 8s is quite fast.

I would duplicate the subset of data in another index.

If you need to immediately reflect changes, you will have to manage the subset index by hand :

when you delete a uuid from the list, you delete the corresponding documents
when you add a uuid, you copy the corresponding documents (reindex api with a query is your friend)
when you insert a document, you have to check if the document should be added in subset index too
when you delete a document, delete it in both indices Force the document id so they are the same in both indices. Beware of refresh time if you store the uuid list in elasticsearch index.

If updating the subset with new uuid is not time critical, you can just run the reindex every day or every hour.

Searching through an alias with filter is very slow in Elasticsearch

1 Answers1