How to normalize score across multiple search results

Question

I need some help in normalizing the score using Elastic Search. I am using N-Gram, Fuzziness, Custom Queries and phonetic search

In the database : Alice, Bob, Cathy

Search query 1 : Alice

   Results are : Max Score(500), Alice(500)[100%], Cathy(300)[60%], Bob(200)[40%]

However, Search query 2 : Both

   Results are : Max Score(200), Bob(200)[100%], Alice(100)[50%], Cathy(50)[25%]

What I want the results to look like :

   Results are : Max Score(500), Bob(200)[40%], Alice(100)[10%], Cathy(50)[5%]

I want a standard max score or a way to quantify the top results of any/multiple queries.

I want the score to show

'how similar the result is to the query'

not

'how the result rank in comparison to the other results.'

Alkis Kalogeris · Accepted Answer · 2020-04-01T14:23:15.147

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-rank-feature-query.html

Moreover, I guess going the other way around would be easier. What do you want to achieve by normalizing the score? In the end, what do you want to really calculate? One approach could go like this: although you cannot be sure if your result matched perfectly by just having the biggest score, you could calculate how relevant it is by checking how much it deviates from the rest. example:

Input: Alice
Output: Alice (100), Alicia (90), Alkis (50), Alex (48) etc

The deviation here is apparent, and you can see that most probably the first results (before the major drop) must be very relative, and not just matching. So you could assume that Alice is 100%

Input: `Alice`
Output: Alexander (100), Alkis (95), Alter (90) etc

The deviation here is not present. There is no major drop, so the results can either all be very relative, or not. So you cannot assume that Alexander is 100%, but does it really matter?

Basically you rely on the fact that you have enough data in order to calculate the major change (a basic normalization by a sigmoid function on a sample that you will determine that is sufficient).

But again, you have a problem, you found a solution and you try to make that solution fit in elasticsearch. Perhaps by describing exactly the problem and the desired outcome could reveal an easier solution.

I got your point what you want to explain. I have made question much specific now. Kindly consider reading it again. I have mentioned what i want clearly, now. — Abhinav Keshri, Apr 01 '20 at 12:32
@AbhinavKeshri maybe this answers your question https://discuss.elastic.co/t/custom-score-for-fuzzy-matching-based-on-levenshtein-distance-score/125544/3 — Alkis Kalogeris, Apr 01 '20 at 14:18

How to normalize score across multiple search results

1 Answers1