0

I am using this analyzer

"settings": {
    "analysis": {
        "char_filter": {
            "my_char_filter": {
                "type": "mapping",
                "mappings": [
                    "- => _",
                ]
            },
            "quote_filter": {
                "type": "mapping",
                "mappings": [
                    "\\u0091=>\\u0020",
                    "\\u0092=>\\u0020",
                ]
            }
        },
        "analyzer": {
            "my_analyzer": {
                "tokenizer": "standard",
                "char_filter": [
                    "my_char_filter", "quote_filter"
                ],
                "filter": [
                    "lowercase",
                ]
            }
        }
    }
}

within this mapping:

"mappings": {
    "properties": {
        "title": {
            "type": "text",
            "analyzer": "my_analyzer",
            "term_vector": "with_positions_offsets",
        },
        "description": {
            "type": "text",
            "analyzer": "my_analyzer",
            "term_vector": "with_positions_offsets",
            "fielddata": True
        },
    }
}

and everything works with simple keywords.

So, if I use this query

{
    "query":
    {
        "bool":
        {
            "must":
            [
                {
                    "query_string":
                    {
                        "query": "\".net\" OR \".com\"",
                        "fields":
                        [
                            "title",
                            "description"
                        ]
                    }
                }
            ]
        }
    },
    "highlight":
    {
        "pre_tags":
        [
            "<match>"
        ],
        "post_tags":
        [
            "</match>"
        ],
        "fields":
        {
            "title":
            {
                "type": "fvh",
                "number_of_fragments": 0
            },
            "description":
            {
                "type": "fvh",
                "number_of_fragments": 0
            }
        }
    }
}

to search ".com" in following description "Google.com is an American multinational technology company (COM) that focuses on artificial intelligence, search engine technology, online advertising, cloud computing and computer software" it only matches "COM" (inside parentheses) instead of ".com".

How can I solve this issue?

EDIT: I am finding that query:

"query_string" : {
    "query" : ".com OR .net OR Engine OR American" # by removing '\"'
    "fields": ["title","description"],
}

work partially, since it maches "Engine" and "American" but I can't know if matches ".com" or ".net" (a human eye would obviously be able to) because query response give me:

matched_keywords: {'Engine', 'American', 'Google.com'}

So, how can have something like

matched_keywords: {'Engine', 'American', '*.com'} 

?

Sandy
  • 85
  • 1
  • 8

1 Answers1

1

This is because the closest token you have is "google.com", in your case the wildcard could solve it but you would lose performance.

{
          "wildcard": {
            "description": {
              "value": "*.com"
            }
          }
        }
rabbitbr
  • 2,991
  • 2
  • 4
  • 17
  • I wouldn't want to use wildcard because I'm also using exact words such as google, yahoo etc. My query should be like ' ".com" OR "Google" OR "Engine" ', but .com in this query doesn't work as it should. – Sandy Jun 30 '22 at 08:20
  • I updated my answer with a consideration – Sandy Jun 30 '22 at 15:38
  • Your analyzer is removing the ".". Run GET index_name/_analyzer to see. The generated token is "google.com" so it doesn't just work with ".com". – rabbitbr Jun 30 '22 at 15:39