1

I'm trying to boost the relevance based on the count of the field value. The less count of the field value, the more relevant.

For example, I have 1001 documents. 1000 documents are written by John, and only one is written by Joe.

// 1000 documents by John
{"title": "abc 1", "author": "John"}
{"title": "abc 2", "author": "John"}
// ...
{"title": "abc 1000", "author": "John"}

// 1 document by Joe
{"title": "abc 1", "author": "Joe"}

I'll get 1001 documents when I search "abc" against title field. These documents should have pretty similar relevance score if they are not exact same. The count of field value "John" is 1000 and the count of field value "Joe" is 1. Now, I'd like to boost the relevance of the document {"title": "abc 1", "author": "Joe"}, otherwise, it would be really hard to see the document with the author Joe.

Thank you!

johanzhou
  • 27
  • 1
  • 4
  • Can't you just sort by ascending doc count? Negative boosting is not supported – sramalingam24 May 12 '18 at 14:57
  • After aggregating by author followed by top_hits aggregation – sramalingam24 May 12 '18 at 15:01
  • @sramalingam24 Thanks for your suggestion. I don't think the sorting by ascending doc count meet my requirement. For example, I don't want to see Joe's document at the very top when John's document relevance score is dramatically higher than Joe's. At my example above, they have the pretty similar score when search "abc". So I'd love to see Joe's document. – johanzhou May 13 '18 at 16:39

1 Answers1

0

In case someone runs into the same use case, I'll explain my workaround by using Function Score Query. This way would make at least two calls to Elasticsearch server.

  1. Get the counts for each person(You may use aggregation feature). In our example, we get 1000 from John and 1 from Joe.
  2. Generate the weight from the counts. The more counts, the less relevance weight. Something like 1 + sqrt(1/1000) for John and 1 + sqrt(1/1) for Joe.
  3. Use the weight in the script to calculate the score according to the author value(The script can be much better):

    {
    "query": {
        "function_score": {
            "query": {
                "match": { "title": "abc" }
            },
            "script_score" : {
                "script" : {
                  "inline": "if (doc['author'].value == 'John') {return (1 + sqrt(1/1000)) * _score}\n return (1 + sqrt(1/1)) * _score;"
                }
            }
        }
    }
    }
    
johanzhou
  • 27
  • 1
  • 4