Suppose that my index has two documents:
- "get my money"
- "my money get here"
When I do a regular match query for "get my money", both documents match correctly but they get equal scores. However, I want the order of words to be significant during scoring. In other words, I want "get my money" to have a higher score.
So I tried putting my match query inside the must clause of a bool query and included a match_phrase (with the same query string). This seems to score hits correctly until I do a search with "how do I get my money". In that case, match_phrase query doesn't seem to match, and the hits are returned with equal scores again.
How can I construct my index/query so that it takes word order into account but does not require all searched words to exist in document?
Index mapping with test data
PUT test-index
{
"mappings": {
"properties" : {
"keyword" : {
"type" : "text",
"similarity": "boolean"
}
}
}
}
POST test-index/_doc/
{
"keyword" : "get my money"
}
POST test-index/_doc/
{
"keyword" : "my money get here"
}
Query "How do I get my money" - Doesn't work as needed
GET /test-index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"keyword": "how do i get my money"
}
}
],
"should": [
{
"match_phrase": {
"keyword": {
"query": "how do i get my money"
}
}
}
]
}
}
}
Results (Both documents scored same)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 3.0,
"hits" : [
{
"_index" : "test-index",
"_type" : "_doc",
"_id" : "6Xy8wXIB3NtI_ttPGBoV",
"_score" : 3.0,
"_source" : {
"keyword" : "get my money"
}
},
{
"_index" : "test-index",
"_type" : "_doc",
"_id" : "6ny8wXIB3NtI_ttPGBpV",
"_score" : 3.0,
"_source" : {
"keyword" : "my money get here"
}
}
]
}
}
Edit 1
As @gibbs suggested, let's remove the "similarity": "boolean"
. A more simplified and focused issue presented below. We are trying to find an answer to this.
Removed
"similarity": "boolean"
PUT test-index
{
"mappings": {
"properties" : {
"keyword" : {
"type" : "text"
}
}
}
}
POST test-index/_doc/
{
"keyword": "get my money"
}
POST test-index/_doc/
{
"keyword": "my money get here"
}
How to make this query return results? now it doesn't. Is it possible to return results if all searched words don't exist in a document when using match_phrase
?
GET /test-index/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"keyword": {
"query": "how do I get my money"
}
}
}
]
}
}
}
Edit 2
In our use case, we can't use BM25 (TF/IDF) because that messes up our results.
POST test-index/_doc
{
"keyword": "get my money, claim, distribution, getting started"
}
POST test-index/_doc
{
"keyword": "my money get here"
}
GET /test-index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"keyword": "how do I get my money"
}
}
]
}
}
}
Results
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6156533,
"hits" : [
{
"_index" : "test-index",
"_type" : "_doc",
"_id" : "JnxCw3IB3NtI_ttPBjQv",
"_score" : 0.6156533,
"_source" : {
"keyword" : "my money get here"
}
},
{
"_index" : "test-index",
"_type" : "_doc",
"_id" : "x3xSw3IB3NtI_ttP1DUi",
"_score" : 0.49206492,
"_source" : {
"keyword" : "get my money, claim, distribution, getting started"
}
}
]
}
}
In this scenario my money get here scores more than intended get my money because of TF/IDF. So, we can't have it where Score calculation will depend on the number of documents match, length of field, etc.
Sorry for the very long question. So, back to my original question How can I construct my index/query so that it takes word order into account but does not require all searched words to exist in document?