2

The elasticsearch docs mention the following (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html#rewrite-section)

The rewriting process is complex and difficult to display, since queries can change drastically. Rather than showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds). This value is cumulative and contains the total time for all queries being rewritten.

I am using a has_child query and it's slow. The docs mention it is slow, but I want to figure out why!

Elasticsearch 7 mapping:

The form_entries are "double" joined. We're going to query form, so there is only one level of has_child.

{
    "mappings": {
        "dynamic": "strict",
        "properties": {
            "pseudo_id": {
                "type": "keyword"
            },
            "form": {
                "dynamic": "strict",
                "properties": {
                    "id": {
                        "type": "keyword",
                        "eager_global_ordinals": true
                    },
                    "start_date": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_second"
                    }
                }
            },
            "form_entries": {
                "dynamic": "strict",
                "properties": {
                    "id": {
                        "type": "keyword",
                        "eager_global_ordinals": true
                    },
                    "form_id": {
                        "type": "keyword",
                        "eager_global_ordinals": true
                    },
                    "start_date": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_second"
                    }
                }
            },
            "patient_joins": {
                "type": "join",
                "eager_global_ordinals": true,
                "relations": {
                    "_doc": [
                        "form"
                    ],
                    "form": "form_entries"
                }
            }
        }
    }
}

Index stats:

forms: 14 million
form_entries: 200 million

Query profile

{
    "index": "main",
    "size": 20,
    "routing": "<pseudo id>",
    "body": {
        "profile": true,
        "query": {
            "constant_score": {
                "filter": {
                    "bool": {
                        "must": [
                            {
                                "term": {
                                    "pseudo_id": "<pseudo id>"
                                }
                            },
                            {
                                "exists": {
                                    "field": "form.id"
                                }
                            },
                            {
                                "has_child": {
                                    "type": "form_entries",
                                    "query": {
                                        "match_all": {}              // <--- Note: We're not even filtering
                                    },
                                    "min_children": 1,
                                    "inner_hits": {
                                        "size": 1
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        },
        "sort": [
            "pseudo_id",
            {
                "form.start_date": {
                    "order": "desc"
                }
            },
            {
                "form.id": {
                    "order": "asc"
                }
            }
        ],
        "_source": true
    }
}

Relevant profile outcomes:

    "profile": {
        "shards": [
            {
                "searches": [
                    {
                        "query": [
                            {
                                "type": "ConstantScoreQuery",
                                "description": "ConstantScore(+pseudo_id:<pseudo id> +ConstantScore(DocValuesFieldExistsQuery [field=form.id]) +(+form.description.raw:consult +GlobalOrdinalsQuery{joinField=patient_joins#form}))",
                                "time_in_nanos": 655400,
                                "breakdown": {
                            ...
                        ],

                        "rewrite_time": 378342100,
                        "collector": [
                                {
                                    "name": "SimpleFieldCollector",
                                    "reason": "search_top_hits",
                                    "time_in_nanos": 279800
                                }
                            ]
                        }

Query time: <1ms Rewrite time: ~378ms?

Question:

So why and/or what does Lucene need to rewrite for has_child? Can additional profile options be used? Can rewrite be disabled?

Semi related: If we reduce the data-set to 50K forms, the query time remains the same, but the rewrite is much faster.

Tessmore
  • 1,054
  • 1
  • 9
  • 23

0 Answers0