I have an index containing 3 documents.
{
"firstname": "Anne",
"lastname": "Borg",
}
{
"firstname": "Leanne",
"lastname": "Ray"
},
{
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
When I search for "Anne", I would like elastic to return all 3 of these documents (because they all match the term "Anne" to a degree). BUT, I would like Leanne Ray to have a lower score (relevance ranking) because the search term "Anne" appears at a later position in this document than the term appears in the other two documents.
Initially, I was using an ngram tokenizer. I also have a generated field in my index's mapping called "full_name" that contains the firstname, middlename and lastname strings. When I searched for "Anne", all 3 documents are in the result set. However, Anne M Stone has the same score as Leanne Ray. Anne M Stone should have a higher score than Leanne.
To address this, I changed my ngram tokenizer to an edge_ngram tokenizer. This had the effect of completely leaving out Leanne Ray from the result set. We would like to keep this result in the result set - because it still contains the query string - but with a lower score than the other two better matches.
I read somewhere that it may be possible to use the edge ngram filter alongside an ngram filter in the same index. If so, how should I recreate my index to do so? Is there a better solution?
Here are the initial index settings.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "3",
"type": "ngram",
"max_gram": "4"
}
}
}
},
"mappings": {
"properties": {
"contact_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
And here is my query
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Anne",
"fields": [
"full_name"
]
}
}
]
}
}
}
This brought back these results
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.59604377,
"hits": [
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "3",
"_score": 0.59604377,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
},
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "1",
"_score": 0.5592099,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "2",
"_score": 0.5592099,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
}
]
}
If I instead use an edge ngram tokenizer, this is what the index's settings look like...
{
"settings": {
"max_ngram_diff": "10",
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": ["edge_ngram_tokenizer"]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "10"
}
}
}
},
"mappings": {
"properties": {
"contact_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and that same query brings back this new result set...
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.5131824,
"hits": [
{
"_index": "contacts_16",
"_type": "_doc",
"_id": "3",
"_score": 1.5131824,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_16",
"_type": "_doc",
"_id": "1",
"_score": 1.4100108,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
}
]
}