Incorrect Results With Punctuation - ElasticSearch

Question

I have a name field that is indexed using the english analyzer that contains part names (also tried the standard analyzer).

The problem I have is some of my titles contain punctuation, some do not. Also, some of my queries contain punctuation and some do not.

For example I have the title "CenterG 5.2 Drive Belt for model number 4425". My query could look like this: "Centerg 5.2 belt" and if it does, then my results display correctly with the "CenterG 5.2 Drive Belt for model number 4425" at the top.

However, if my query does not contain punctuation, the product does not display in the results. I have the same problem for titles that don't contain punctuation and queries that do. I'm not sure how this should be handled. I tried using the standard analyzer which I understand disregards punctuation, but that did not improve the results. They were roughly the same.

So, when I search for "CenterG 5.2 Belt" or "centerg 52 belt", I want the product "CenterG 5.2 Drive belt for model number 4425" to display at the top of my results.

Here is my mapping:

{:properties=>{:name=>{:type=>"text", :analyzer=>"english"}}

I have also tried leveraging an ngram analyzer which did not fix this problem.

Here is my query:

       {
            query: {
                bool: {
                    should: 
                       {
                            multi_match:{
                                fields: ["name"],
                                query: "#{query}"
                            }
                        }
                 }
              }

        }

So all what you want is to preserve your punctuation marks in your documents, so that when query contains them, then only they should come in result otherwise not? In this case, Please bold your expected O/P which you want which an detailed example. — Amit, Sep 06 '19 at 04:12
wouldn't it be wrong to show results for `centerg 52 belt"`, as title contained `5.2` not `52` which is very different ? — Amit, Sep 06 '19 at 04:37
For most purposes yes but the data I am dealing with sucks so I am trying to ultimately ignore the `.`. I have cases where some model numbers may not contain a `.` but the query might so that gives strange results also. I have a lot of cases where the model number might be longer such as `PFT11473.1` and I need `PFT114731` to match. — Cannon Moyer, Sep 06 '19 at 04:44
First part is completed, I would write detailed answer soon, I am trying the solution on es 7.x, which version u use — Amit, Sep 06 '19 at 05:34
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199047/discussion-between-amit-khandelwal-and-cannon-moyer). — Amit, Sep 06 '19 at 05:41

Amit · Accepted Answer · 2019-10-09T01:56:19.257

This is difficult to achieve with just 1 field and 1 analyzer. first part of your example is easy to achieve if you just use a custom analyzer which removes all the dots . with empty space, both at index time and query time.

But in your comment, you mentioned that you want to search document containing PFT11473.1 with search query PFT11473, for which you need to create another analyzer which would replace . with space , so that 2 tokens are generated PFT11473 and 1 and anyone would be searchable.

I created 2 fields for storing your title field using 2 different analyzers which serves both the use-cases you mentioned.

Below is the index mapping:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "replace_dots"
                    ]
                },
                "space_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "replace_dots_space"
                    ]
                }
            },
            "char_filter": {
                "replace_dots": {
                    "type": "mapping",
                    "mappings": [
                        ". =>"
                    ]
                },
                "replace_dots_space": {
                    "type": "mapping",
                    "mappings": [
                        ". => \\u0020"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "analyzer": "my_analyzer",
                "type": "text"
            },
            "title_space": {
                "analyzer": "space_analyzer",
                "type": "text"
            }
        }
    }
}

And this is how I indexed one example doc:

{
  "title" : "PFT11473.1",
  "title_space": "PFT11473.1"
}

And final search query:

{
    "query": {
        "multi_match": {
            "query": "PFT11473.1",
            "fields": [
                "title",
                "title_space"
            ]
        }
    }
}

Incorrect Results With Punctuation - ElasticSearch

1 Answers1