This is not something you can solve in Elasticsearch, without adding more information. You want to rank "red bottles" over "bottle caps" because you know semantic information about these names -- you know that "red bottles" means the thing it's talking about is a "bottle", and "bottle caps" means the thing it's talking about is something else (related to bottles, but not actually a bottle). If you want ranking from Elasticsearch to take this information into account, you have to index the information (maybe add a keyword tag field, one with "bottle" and one with "bottle caps" -- you will have to experiment to see what works with your use case). Of course this means that a person has to ad tags for everything.
However, I suspect you can improve the situation some with the unique filter. My guess is that you don't care a lot about term frequency in a single title ("Bottle caps for 500ml bottle" isn't more about bottles because "bottle" appears twice in it -- term frequency makes little sense for titles like this I think). So you could do something like this:
PUT /myindex
{
"settings": {
"index": {
"number_of_shards": 1
},
"analysis": {
"analyzer": {
"uniq_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"porter_stem",
"unique"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "uniq_analyzer"
}
}
}
}
}
PUT /myindex/doc/1
{"name": "Red coloured bottles"}
PUT /myindex/doc/2
{"name": "Bottle caps for 500ml bottle"}
Then if you search bottle
, you'll see the scores are identical -- not perfect, but an improvement. In case you want to understand where a score is coming from, you can use explain:
POST /myindex
{
"explain": true,
"query": {
"match":
{"name": "bottle"}
}
}