There are a couple of ways to do it. There's a cleaner approach with several queries (using Multi Search API), and a more sophisticated approach with a single query (using function_score
query). Let me explain you how.
Cleaner approach using _msearch
Simply put, _msearch
allows to make one HTTP request with several Elasticsearch queries in it. I would advise to split the initial query into several queries and sort them by date. This approach will be simpler because as I will show you later, fitting this into one query will require modification of scoring, which is not an easy thing to do.
You can also make several requests without usage of _msearch
, whichever you see fit.
Why other approaches didn't work?
You already know about simple score tuning via boosting some fields over others, like in this example multi_match
query:
POST /myscores/_search
{
"query": {
"multi_match": {
"query": "beyond",
"fields": ["field_A^3", "filed_B^2", "Field_C^1"]
}
}
}
This will simply take the score of the match times 3 if it is matched to field_A
, times 2 if filed_B
, etc.
Now, the score is just a real positive number, and it needs to represent where in the list of matched results should we place a particular document.
As you already tried, if you ask Elasticsearch to use updated_time
as sorting measure, it will ignore the score from matching, which is not desired.
The suggestion of fellow Gibbs also didn't seem to work, because using sorting by _score
and then by updated_time
(or vice versa) was disregarding one or the other option.
Is there a way to unite _score
and updated_time
?
There is, let's try to use function_score
:
POST /myscores/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "beyond",
"fields": [
"field_A^3",
"filed_B^2",
"Field_C"
]
}
},
"score_mode": "max",
"boost_mode": "multiply", <=== 2
"field_value_factor": { <=== 1
"field": "updated_time",
"factor": 0.00000000001,
"missing": 1
}
}
}
}
function_score
allows you to fine tune the score of the query.
We take multi_match
query that we are already familiar with from the section above, and try to modify it.
First, we know that we want it to take into account updated_time
. We use field_value_factor
as function to modify the score (point 1 in the query above).
Now, we tell it to multiply the value of updated_time
and the score of the query - via setting boost_mode
to multiply
(point 2).
Executing this query will produce something like this:
"hits": [
{
...
"_score": 43.121338,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "**beyond** filed_A",
"filed_B": "The properties in Japan is high",
"Field_C": "Test description for filed_A"
}
},
{
...
"_score": 43.048275,
"_source": {
"updated_time": "2020-01-04T01:00:06.870000Z",
"field_A": "Slovakia beyond",
"filed_B": "The properties in Slovakia are beyound...",
"Field_C": "Once you fix the relevance then sorting should work correctly."
}
},
{
...
"_score": 29.028637,
"_source": {
"updated_time": "2020-01-04T01:00:06.870000Z",
"field_A": "Test filed_B",
"filed_B": "**beyond** is search term in filed_B",
"Field_C": "Test description for filed_B"
}
},
{
...
"_score": 24.44329,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "Test filed_B",
"filed_B": "**beyond** is search term in filed_B Test for Feb",
"Field_C": "Test description for filed_B test for Feb"
}
},
{
...
"_score": 23.517717,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "Search Term filed_C",
"filed_B": " is the search term for lowest column",
"Field_C": "**beyond** Test description for filed_C "
}
}
]
Notice that scores of field_A
matches are close to each other, but a little bit away from those of filed_B
.
Also notice that the order by updated_time
is most recent first; we will address the reverse order now.
How to use updated_time
to sort in reverse order?
field_value_factor
allows to multiply the original value from the field by some factor.
Internally Elasticsearch stores dates as unix timestamps. It is an integer of 10 digits, which is literally ~10 orders of magnitude bigger than the score ES returned me. So I chose to make them of comparable order:
"field_value_factor": {
"field": "updated_time",
"factor": 0.00000000001,
"missing": 1
}
Now, this gives us an equivalent of SORT BY updated_time DESC
:
Feb 2020
Jan 2020
But what if we need it to be SORT BY updated_time ASC
?
Jan 2020
Feb 2020
We cannot multiply by a negative factor because scores in Elasticsearch have to be positive real numbers.
What we can do instead is to modify the original value with 1/x
, like here:
"field_value_factor": {
"field": "updated_time",
"factor": 0.00000000001,
"missing": 1,
"modifier": "reciprocal" <=== 1/x
}
This will finally give us the order you asked for in the question:
"hits": [
{
...
"_score": 0.17285699,
"_source": {
"updated_time": "2020-01-04T01:00:06.870000Z",
"field_A": "Slovakia beyond",
"filed_B": "The properties in Slovakia are beyound...",
"Field_C": "Once you fix the relevance then sorting should work correctly."
}
},
{
...
"_score": 0.1725641,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "**beyond** filed_A",
"filed_B": "The properties in Japan is high",
"Field_C": "Test description for filed_A"
}
},
{
...
"_score": 0.116562225,
"_source": {
"updated_time": "2020-01-04T01:00:06.870000Z",
"field_A": "Test filed_B",
"filed_B": "**beyond** is search term in filed_B",
"Field_C": "Test description for filed_B"
}
},
{
...
"_score": 0.0978178,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "Test filed_B",
"filed_B": "**beyond** is search term in filed_B Test for Feb",
"Field_C": "Test description for filed_B test for Feb"
}
},
{
...
"_score": 0.09411382,
"_source": {
"updated_time": "2020-02-04T01:00:06.870000Z",
"field_A": "Search Term filed_C",
"filed_B": " is the search term for lowest column",
"Field_C": "**beyond** Test description for filed_C "
}
}
How to do it in Java?
Although I can't provide you with the ready code, I believe that you may start from FunctionScoreBuilder and try to integrate it with your existing code.
Hope this helps!