2

I have a simple field of type "text" in my index.

"keywordName": {
          "type": "text"
        }

And I have these documents already inserted : "samsung", "samsung galaxy", "samsung cover", "samsung charger".

If I make a simple "match" query, the results are disturbing:

Query:

GET keywords/_search
{
  "query": {
    "match": {
      "keywordName": "samsung"
    }
  }
}

Results:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 1.113083,
    "hits": [
      {
        "_index": "keywords",
        "_type": "keyword",
        "_id": "samsung galaxy",
        "_score": 1.113083,
        "_source": {
          "keywordName": "samsung galaxy"
        }
      },
      {
        "_index": "keywords",
        "_type": "keyword",
        "_id": "samsung charger",
        "_score": 0.9433406,
        "_source": {
          "keywordName": "samsung charger"
        }
      },
      {
        "_index": "keywords",
        "_type": "keyword",
        "_id": "samsung",
        "_score": 0.8405092,
        "_source": {
          "keywordName": "samsung"
        }
      },
      {
        "_index": "keywords",
        "_type": "keyword",
        "_id": "samsung cover",
        "_score": 0.58279467,
        "_source": {
          "keywordName": "samsung cover"
        }
      }
    ]
  }
}

First Question : Why "samsung" has not the highest score?

Second Question : How can I make a query or analyser which gives me "samsung" as highest score?

Gun
  • 501
  • 8
  • 27
  • You are basically asking the same question as http://stackoverflow.com/questions/43257656/elasticsearch-analyzer-on-text-field. The answer for your first question is https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html. The answer to your second question is the other post I already replied to. – Andrei Stefan Apr 11 '17 at 10:24
  • The question is quite the same but if I take the same index & analyzer as the other index, and I search for "samsungs", the token is "samsung" so the term query doesn't work & the match query returns "samsung galaxy" ... – Gun Apr 11 '17 at 12:08
  • That's why is important to lay down all your requirements before starting coming up with a mapping for your index and the list of queries for those requirements. – Andrei Stefan Apr 11 '17 at 12:13

1 Answers1

1

Starting from the same index settings (analyzers, filters, mappings) as in my previous reply, I suggest the following solution. But, as I mentioned, you need to lay down all the requirements in terms of what you need to search for in this index and consider all of this as a complete solution.

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_stop": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "my_stop",
            "my_snow",
            "asciifolding"
          ]
        }
      },
      "filter": {
        "my_stop": {
          "type": "stop",
          "stopwords": "_french_"
        },
        "my_snow": {
          "type": "snowball",
          "language": "French"
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "keywordName": {
          "type": "text",
          "analyzer": "custom_stop",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}
POST /test/test/_bulk
{"index":{}}
{"keywordName":"samsung galaxy"}
{"index":{}}
{"keywordName":"samsung charger"}
{"index":{}}
{"keywordName":"samsung cover"}
{"index":{}}
{"keywordName":"samsung"}

GET /test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "keywordName": {
              "query": "samsungs",
              "operator": "and"
            }
          }
        },
        {
          "term": {
            "keywordName.raw": {
              "value": "samsungs"
            }
          }
        },
        {
          "fuzzy": {
            "keywordName.raw": {
              "value": "samsungs",
              "fuzziness": 1
            }
          }
        }
      ]
    }
  },
  "size": 10
}
Community
  • 1
  • 1
Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • I agree with you about the requirements. The issue is that it is kind of evolutive in my case. I don't have the full batch of data this is quite a problem to set an exhaustive list of requirements. – Gun Apr 11 '17 at 13:45