Exact search getting less precedence than phonetic search?

Question

I have an elasticsearch index and am using the following query:

    "_source": [
        "title", 
        "content"

    ],
    "size": 15,
    "from": 0,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "{{query}}",
                    "fields": [
                        "title",
                        "content"
                    ],
                    "operator": "or"
                }
            },
            "should": [
                {
                    "multi_match": {
                        "query": "{{query}}",
                        "fields": [
                            "title.standard^16",
                            "content.standard^2"
                        ],
                        "operator": "and"
                    }
                },
                {
                    "match_phrase": {
                        "content.standard": {
                            "query": "{{query}}",
                            "_name": "Phrase on title",
                            "boost": 1000
                        }
                    }
                }
            ]
        }
    },
    "highlight": {

        "fields": {
            "content": {}
        },
        "fragment_size": 100
    }
}

Here is the mapping I set:

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }, 
                    "stemmer": {
                        "type": "text", 
                        "analyzer": "english"  
                    }
                }
            },
            "content": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }, 
                    "stemmer": {
                        "type": "text", 
                        "analyzer": "english"  
                    }
                }
            }
        }
    }
}

Here is my logic with the query:

1) It will give the highest precedence to a phrase if it appears.

2) If not it will use the standard analyzer (that is the text, as is) and give it the highest precedence.

3) If all else doesn't match up, it will use the phonetic analyzer to get the results, that is the least precedence.

But obviously there is some fault to this as it seems to give higher precedence to the phonetic analyzer than the standard or phrase. For example, if I search for "Person of Indian Origin" it returns results on the top highlighting "Pursuant" "pursuing" and very, very less number of results with person of Indian origin although I know a large number of them exists. How do I solve this?

can you also provide your sample docs that would work with your mapping so that I can reproduce your issue? also which version of ES? — Amit, Apr 12 '20 at 14:36
looks like its not available in standard installation of ES, it gives `Unknown filter type [phonetic] for [my_metaphone]` and https://www.elastic.co/guide/en/elasticsearch/plugins/7.6/analysis-phonetic.html this page shows it requires `analysis-phonetic` plugin — Amit, Apr 12 '20 at 14:42
Hi @OpsterElasticsearchNinja, I will send by a sample doc soon. I am using AWS Elasticsearch Service, I believe it is pre-installed! — Shawn, Apr 12 '20 at 14:44
https://aws.amazon.com/about-aws/whats-new/2016/12/amazon-elasticsearch-service-now-supports-phonetic-analysis/ — Shawn, Apr 12 '20 at 14:44
ohh aws based ES :p, yeah looks like its pre-installed :) sorry don't work with aws-es and doesn't like it as it hides a lot of things :D — Amit, Apr 12 '20 at 14:45
Yup. Pretty difficult. But it offers a generous free tier, so many nascent businesses like ours is dependent on it :( — Shawn, Apr 12 '20 at 14:48
@OpsterElasticsearchNinja Profusely apologise for the late reply. Here is a sample doc - https://pastebin.com/mzfwz0b3 . Thanks a ton! — Shawn, Apr 15 '20 at 07:05
Thanks and no need to be sorry, let me see if I can be of help here — Amit, Apr 15 '20 at 07:07
@OpsterElasticsearchNinja Were you able to find the problem? — Shawn, Apr 17 '20 at 09:47
No didn't get time, although I took some time to setup AWS ES — Amit, Apr 17 '20 at 10:10
@Shawn I tried to replicate the issue but found your query to be correct. I used the sample document you shared. Created another document with same content but just replaced `Person of Indian Origin` by `person of this country` in second doc. The query returned both the docs as expected and scored the 1st doc higher because the phrase matched in the first document. Score of **doc 1: 1806.9044** and of **doc 2: 0.90199673** — Nishant, Apr 21 '20 at 04:52

Exact search getting less precedence than phonetic search?

0 Answers0