0

I have an elasticsearch index and am using the following query:

    "_source": [
        "title", 
        "content"

    ],
    "size": 15,
    "from": 0,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "{{query}}",
                    "fields": [
                        "title",
                        "content"
                    ],
                    "operator": "or"
                }
            },
            "should": [
                {
                    "multi_match": {
                        "query": "{{query}}",
                        "fields": [
                            "title.standard^16",
                            "content.standard^2"
                        ],
                        "operator": "and"
                    }
                },
                {
                    "match_phrase": {
                        "content.standard": {
                            "query": "{{query}}",
                            "_name": "Phrase on title",
                            "boost": 1000
                        }
                    }
                }
            ]
        }
    },
    "highlight": {

        "fields": {
            "content": {}
        },
        "fragment_size": 100
    }
}

Here is the mapping I set:

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "lowercase",
                            "my_metaphone"
                        ]
                    }
                },
                "filter": {
                    "my_metaphone": {
                        "type": "phonetic",
                        "encoder": "metaphone",
                        "replace": true
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }, 
                    "stemmer": {
                        "type": "text", 
                        "analyzer": "english"  
                    }
                }
            },
            "content": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "my_analyzer",
                "fields": {
                    "standard": {
                        "type": "text"
                    }, 
                    "stemmer": {
                        "type": "text", 
                        "analyzer": "english"  
                    }
                }
            }
        }
    }
}

Here is my logic with the query:

1) It will give the highest precedence to a phrase if it appears.

2) If not it will use the standard analyzer (that is the text, as is) and give it the highest precedence.

3) If all else doesn't match up, it will use the phonetic analyzer to get the results, that is the least precedence.

But obviously there is some fault to this as it seems to give higher precedence to the phonetic analyzer than the standard or phrase. For example, if I search for "Person of Indian Origin" it returns results on the top highlighting "Pursuant" "pursuing" and very, very less number of results with person of Indian origin although I know a large number of them exists. How do I solve this?

Amit
  • 30,756
  • 6
  • 57
  • 88
Shawn
  • 261
  • 1
  • 7
  • 25
  • can you also provide your sample docs that would work with your mapping so that I can reproduce your issue? also which version of ES? – Amit Apr 12 '20 at 14:36
  • looks like its not available in standard installation of ES, it gives `Unknown filter type [phonetic] for [my_metaphone]` and https://www.elastic.co/guide/en/elasticsearch/plugins/7.6/analysis-phonetic.html this page shows it requires `analysis-phonetic` plugin – Amit Apr 12 '20 at 14:42
  • Hi @OpsterElasticsearchNinja, I will send by a sample doc soon. I am using AWS Elasticsearch Service, I believe it is pre-installed! – Shawn Apr 12 '20 at 14:44
  • https://aws.amazon.com/about-aws/whats-new/2016/12/amazon-elasticsearch-service-now-supports-phonetic-analysis/ – Shawn Apr 12 '20 at 14:44
  • ohh aws based ES :p, yeah looks like its pre-installed :) sorry don't work with aws-es and doesn't like it as it hides a lot of things :D – Amit Apr 12 '20 at 14:45
  • Yup. Pretty difficult. But it offers a generous free tier, so many nascent businesses like ours is dependent on it :( – Shawn Apr 12 '20 at 14:48
  • @OpsterElasticsearchNinja Profusely apologise for the late reply. Here is a sample doc - https://pastebin.com/mzfwz0b3 . Thanks a ton! – Shawn Apr 15 '20 at 07:05
  • 1
    Thanks and no need to be sorry, let me see if I can be of help here – Amit Apr 15 '20 at 07:07
  • @OpsterElasticsearchNinja Were you able to find the problem? – Shawn Apr 17 '20 at 09:47
  • 1
    No didn't get time, although I took some time to setup AWS ES – Amit Apr 17 '20 at 10:10
  • @Shawn I tried to replicate the issue but found your query to be correct. I used the sample document you shared. Created another document with same content but just replaced `Person of Indian Origin` by `person of this country` in second doc. The query returned both the docs as expected and scored the 1st doc higher because the phrase matched in the first document. Score of **doc 1: 1806.9044** and of **doc 2: 0.90199673** – Nishant Apr 21 '20 at 04:52

0 Answers0