I am trying to make a query in Elasticsearch that gives more accurate results than what I am doing today.
In my documents, I have a brand
property, a description
property, and a size
property. An example document could be:
{
brand: "Coca Cola",
size: "1l",
description: "drink"
}
Let's say a user searches for "Coca Cola drink 1l"
. Now, I'd like the following boost values:
brand
: 4description
: 1size
: 2
Each property can contain any kind of string value, and is not constrained to anything. In addition, each property uses its own analyzer (but that has been removed from the examples due to simplicity).
Now, what would be the best way to achieve the above?
What I've tried
Method 1
This approach searches for every word in every property, with different weights for each property. It ensures that all words individually match at least one property.
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{ "match": { "brand": { "boost": 4, "query": "Coca" } } },
{ "match": { "description": { "boost": 1, "query": "Coca" } } },
{ "match": { "size": { "boost": 2, "query": "Coca" } } }
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{ "match": { "brand": { "boost": 4, "query": "Cola" } } },
{ "match": { "description": { "boost": 1, "query": "Cola" } } },
{ "match": { "size": { "boost": 2, "query": "Cola" } } }
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{ "match": { "brand": { "boost": 4, "query": "drink" } } },
{ "match": { "description": { "boost": 1, "query": "drink" } } },
{ "match": { "size": { "boost": 2, "query": "drink" } } }
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{ "match": { "brand": { "boost": 4, "query": "1l" } } },
{ "match": { "description": { "boost": 1, "query": "1l" } } },
{ "match": { "size": { "boost": 2, "query": "1l" } } }
],
"minimum_should_match": 1
}
}
]
}
}
}
The problem with the above approach is that the boost becomes off. The brand "Coca Cola"
consists of two words, and is then boosted with a value of 8
instead of 4
.
Method 2
I tried using a multi-match query (with a minimum should match of "100%", with individual boosts for every property.
I feel like this is a little better, but I can't specify an analyzer to use for each individual property.
Method 3
I tried using the whole sentence for each individual property. I feel like this is the right way to go, but then a search for "foobar"
still yields 1 result (for some reason).
{
"query": {
"bool": {
"should": [
{ "match": { "brand": { "minimum_should_match": 1, "boost": 4, "query": "Coca Cola drink 1l" } } },
{ "match": { "description": { "minimum_should_match": 1, "boost": 1, "query": "Coca Cola drink 1l" } } },
{ "match": { "size": { "minimum_should_match": 1, "boost": 2, "query": "Coca Cola drink 1l" } } }
],
"minimum_should_match": 1
}
}
}