I'm building a search database. Most entries are proper nouns (names and street addresses). I set up an ngram token filter to help with fast fuzzy searching. It works well. However, if I search for "John Allen", the results include "John Allen" and "Allen John" with the same score (i.e. relevance ranking). How can I tune the index settings or query syntax to make elastic still return both documents when I search for "John Allen", but assign a higher score to "John Allen" than to "Allen John"?
Here are the index settings...
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "3",
"type": "ngram",
"max_gram": "4"
}
}
}
},
"mappings": {
"properties": {
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and here is a sample query...
{
"query": {
"query_string": {
"query": "Allen John",
"fields": [
"full_name"
]
}
}
}
Other notes:
- We are not using wildcards because they slow down queries.
- Our typical index will have 10 million documents or fewer.
- Speed is critical, just as it is in most elasticsearch applications.
- From what I've read so far, it's possible the answer or hints to the answer are in elasticsearch's edge n-gram tokenization technique or elasticsearch's completion suggester. Or maybe not.
I have also tried this following query... (after reading ElasticSearch taking word order into account in match query) It did not help with my issue.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "Bill",
"fields": [
"full_name"
]
}
},
"should": {
"span_near": {
"clauses": [
{
"span_term": {
"full_name": "Bill Tim"
}
}
],
"slop": 5
}
}
}
}
}