ElasticSearch Boost filed and sort by date

Question

I am trying to boost query by fields and then sort them by date :

        multiMatchQuery.fields(columnSortOrder());
        searchSourceBuilder.trackScores(true);
        searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
        searchSourceBuilder.sort("updated_time",SortOrder.DESC);

When I execute it return result boost by column. I want to get result with combination of both Highest Filed order sort by date

Boost Order

Field_A^3
Field_B^2
Field_C^1

sample data:

{
  "_source": {
    "updated_time": "2020-01-04T01:00:06.870000Z",
    "field_A": "Slovakia beyond",
    "filed_B": "The properties in Slovakia are beyound...",
    "Field_C": "Once you fix the relevance then sorting should work correctly."
  }

  {
  "_source": {
    "updated_time": "2020-02-04T01:00:06.870000Z",
    "field_A": "**beyond** filed_A",
    "filed_B": "The properties in Japan is high",
    "Field_C": "Test description for filed_A"
  }

    {
  "_score": 2.56865,
  "_source": {
    "updated_time": "2020-01-04T01:00:06.870000Z",
    "field_A": "Test filed_B",
    "filed_B": "**beyond** is search  term in filed_B",
    "Field_C": "Test description for filed_B"
  }

      {
  "_source": {
    "updated_time": "2020-02-04T01:00:06.870000Z",
    "field_A": "Test filed_B",
    "filed_B": "**beyond** is search  term in filed_B Test for Feb",
    "Field_C": "Test description for filed_B test for Feb"
  }

     {
    "_source": {
    "updated_time": "2020-02-04T01:00:06.870000Z",
    "field_A": "Search Term filed_C",
    "filed_B": " is the search term for lowest column",
    "Field_C": "**beyond** Test description for filed_C "
  }

suppose search term is "beyond" If search term is found in [field_A,field_B,filed_C] Expected Result is:

[first priority Field_A sort by date]

Slovakia beyond Jan 2020
beyond filed_A Feb 2020

[second priority Field_B sort by date]

beyond is search term in filed_B Jan 2020
beyond is search term in filed_B Test for Feb 2020

[Third priority Field_C sort by date]

beyond Test description for filed_C Feb 2020

May you please update the answer with an example of queries for Field_A, Field_B, etc.? Is it a full-text match, or an exact match? May you please tell, if there is match in Field_A but its most recent `document_date` is 2018, should it be in the result set higher than `Field_B, 2020`? — Nikolay Vasiliev, May 17 '20 at 09:28
@NikolayVasiliev this is not exact match and yes you are right if match is find in Field_A it should be higher regardless of dates but if there are more than one match match find in Field_A, with dates ex (2018,2020,2019) then Field_A should be sort by date accordingly. I have added some sample date in question and with expected result Thanks — Talha Bin Shakir, May 18 '20 at 17:34

Gibbs · Answer 1 · 2020-05-16T09:37:12.723

0

It could be because of this

When sorting on a field, scores are not computed. By setting track_scores to true, scores will still be computed and tracked.

So enable track_scores for your query.

Java API

Use trackScores with parameterised variation.

And

When I try with sample data, sorting by score also required.

   {
     "_score": {
        "order": "desc"
     }
    }

Add this as first sort and then sort by DESC date. It works as below.

If search term is part of more than one field [field1, field2, field3], then combined score will be calculated.

edited May 16 '20 at 09:37

answered May 16 '20 at 08:25

Gibbs

21,904
13
74
138

@Val Would you please check this? – Gibbs May 16 '20 at 14:13
Yes I did tried but it did priorities by score but not by date still. What I want is actually if search term is found in [field1, field2, field3] result should set on top [field1] and sort by date after [filed2] sort by date and the [field3] sort by date as I defined filed boost [Field_A^3 Field_B^2 Field_C^1]. I do not want to sort result with combine scoring. – Talha Bin Shakir May 17 '20 at 07:56
Could you please provide some example data where you are facing the priblem – Gibbs May 17 '20 at 10:56
I have added some sample data with expected result, actually the criteria is to boost column and then sort them by date. All match find in filed A should be the first but among each other it should be sort by date as well. – Talha Bin Shakir May 18 '20 at 17:36
Could you please add mapping for the fields also? – Gibbs May 19 '20 at 09:48

score 0 · Answer 2 · answered May 19 '20 at 20:15

There are a couple of ways to do it. There's a cleaner approach with several queries (using Multi Search API), and a more sophisticated approach with a single query (using function_score query). Let me explain you how.

Cleaner approach using `_msearch`

Simply put, _msearch allows to make one HTTP request with several Elasticsearch queries in it. I would advise to split the initial query into several queries and sort them by date. This approach will be simpler because as I will show you later, fitting this into one query will require modification of scoring, which is not an easy thing to do.

You can also make several requests without usage of _msearch, whichever you see fit.

Why other approaches didn't work?

You already know about simple score tuning via boosting some fields over others, like in this example multi_match query:

POST /myscores/_search
{
    "query": {
        "multi_match": {
            "query": "beyond",
            "fields": ["field_A^3", "filed_B^2", "Field_C^1"]
        }
    }
}

This will simply take the score of the match times 3 if it is matched to field_A, times 2 if filed_B, etc.

Now, the score is just a real positive number, and it needs to represent where in the list of matched results should we place a particular document.

As you already tried, if you ask Elasticsearch to use updated_time as sorting measure, it will ignore the score from matching, which is not desired.

The suggestion of fellow Gibbs also didn't seem to work, because using sorting by _score and then by updated_time (or vice versa) was disregarding one or the other option.

Is there a way to unite `_score` and `updated_time`?

There is, let's try to use function_score:

POST /myscores/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "beyond",
                    "fields": [
                        "field_A^3",
                        "filed_B^2",
                        "Field_C"
                    ]
                }
            },
            "score_mode": "max",     
            "boost_mode": "multiply", <=== 2
            "field_value_factor": {   <=== 1
                "field": "updated_time",   
                "factor": 0.00000000001,
                "missing": 1
            }
        }
    }
}

function_score allows you to fine tune the score of the query.

We take multi_match query that we are already familiar with from the section above, and try to modify it.

First, we know that we want it to take into account updated_time. We use field_value_factor as function to modify the score (point 1 in the query above).

Now, we tell it to multiply the value of updated_time and the score of the query - via setting boost_mode to multiply (point 2).

Executing this query will produce something like this:

"hits": [
  {
    ...
    "_score": 43.121338,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "**beyond** filed_A",
      "filed_B": "The properties in Japan is high",
      "Field_C": "Test description for filed_A"
    }
  },
  {
    ...
    "_score": 43.048275,
    "_source": {
      "updated_time": "2020-01-04T01:00:06.870000Z",
      "field_A": "Slovakia beyond",
      "filed_B": "The properties in Slovakia are beyound...",
      "Field_C": "Once you fix the relevance then sorting should work correctly."
    }
  },
  {
    ...
    "_score": 29.028637,
    "_source": {
      "updated_time": "2020-01-04T01:00:06.870000Z",
      "field_A": "Test filed_B",
      "filed_B": "**beyond** is search  term in filed_B",
      "Field_C": "Test description for filed_B"
    }
  },
  {
    ...
    "_score": 24.44329,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "Test filed_B",
      "filed_B": "**beyond** is search  term in filed_B Test for Feb",
      "Field_C": "Test description for filed_B test for Feb"
    }
  },
  {
    ...
    "_score": 23.517717,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "Search Term filed_C",
      "filed_B": " is the search term for lowest column",
      "Field_C": "**beyond** Test description for filed_C "
    }
  }
]

Notice that scores of field_A matches are close to each other, but a little bit away from those of filed_B.

Also notice that the order by updated_time is most recent first; we will address the reverse order now.

How to use `updated_time` to sort in reverse order?

field_value_factor allows to multiply the original value from the field by some factor.

Internally Elasticsearch stores dates as unix timestamps. It is an integer of 10 digits, which is literally ~10 orders of magnitude bigger than the score ES returned me. So I chose to make them of comparable order:

            "field_value_factor": {
                "field": "updated_time",   
                "factor": 0.00000000001,
                "missing": 1
            }

Now, this gives us an equivalent of SORT BY updated_time DESC:

Feb 2020
Jan 2020

But what if we need it to be SORT BY updated_time ASC?

Jan 2020
Feb 2020

We cannot multiply by a negative factor because scores in Elasticsearch have to be positive real numbers.

What we can do instead is to modify the original value with 1/x, like here:

        "field_value_factor": {
            "field": "updated_time",
            "factor": 0.00000000001,
            "missing": 1,
            "modifier": "reciprocal"  <=== 1/x
        }

This will finally give us the order you asked for in the question:

"hits": [
  {
    ...
    "_score": 0.17285699,
    "_source": {
      "updated_time": "2020-01-04T01:00:06.870000Z",
      "field_A": "Slovakia beyond",
      "filed_B": "The properties in Slovakia are beyound...",
      "Field_C": "Once you fix the relevance then sorting should work correctly."
    }
  },
  {
    ...
    "_score": 0.1725641,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "**beyond** filed_A",
      "filed_B": "The properties in Japan is high",
      "Field_C": "Test description for filed_A"
    }
  },
  {
    ...
    "_score": 0.116562225,
    "_source": {
      "updated_time": "2020-01-04T01:00:06.870000Z",
      "field_A": "Test filed_B",
      "filed_B": "**beyond** is search  term in filed_B",
      "Field_C": "Test description for filed_B"
    }
  },
  {
    ...
    "_score": 0.0978178,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "Test filed_B",
      "filed_B": "**beyond** is search  term in filed_B Test for Feb",
      "Field_C": "Test description for filed_B test for Feb"
    }
  },
  {
    ...
    "_score": 0.09411382,
    "_source": {
      "updated_time": "2020-02-04T01:00:06.870000Z",
      "field_A": "Search Term filed_C",
      "filed_B": " is the search term for lowest column",
      "Field_C": "**beyond** Test description for filed_C "
    }
  }

How to do it in Java?

Although I can't provide you with the ready code, I believe that you may start from FunctionScoreBuilder and try to integrate it with your existing code.

Hope this helps!

ElasticSearch Boost filed and sort by date

2 Answers2

Cleaner approach using _msearch

Why other approaches didn't work?

Is there a way to unite _score and updated_time?

How to use updated_time to sort in reverse order?

How to do it in Java?

Cleaner approach using `_msearch`

Is there a way to unite `_score` and `updated_time`?

How to use `updated_time` to sort in reverse order?