I'm doing function_score
queries in elastic search
.
The boost weights of the query are determined ad-hoc (and differ between users). Also, the terms that are queried will differ between users depending on context. An example query might look like this:
{
"query": {
"function_score": {
"filter": {
"term": { "in_stock": true },
... more filters ...
},
"functions": [
{
"filter": { "term": { "color": "red" }},
"weight": 2
},
{
"filter": { "term": { "style": "elegant" }},
"weight": 1
},
{
"filter": { "term": { "length": "long" }},
"weight": 3
}
],
"score_mode": "sum",
}
}
}
The document is simple and looks along the lines of:
{
"product_id" : "abc",
"name" : "blah blah",
"price" : 10
"in_stock" : true,
"color: "red",
"style" : "elegant",
"length" : "long",
... more attributes...
}
the mapping types of the filtered terms are keywords
and boolean
. Not doing any free text stuff anywhere.
The query performance is reasonable until the index size becomes large (around 1 million documents in the index). At that point the query will take multiple seconds to complete.
Index configuration:
I've played around with limiting shard size, currently the shards are limited to 1 million items because after that the performance seems to become even worse. Replication is at 5. The index is read only.
Since the weights and the terms will differ between queries, I'm not sure if it is possible to pre-sort the index in such a way that will speed up the query.
I'm not sure how/if elastic search can cache results, score and ordering in the case of weighted queries.