13

I'm using ES for searching a huge list of human names employing fuzzy search techniques.

TF is applicable for scoring, but IDF is really not required for me in this case. This is really diluting the score. I still want TF and Field Norm to be applied to the score.

How do I disable/suppress IDF for my queries, but keep TF and Field Norm?

I came across the Disable IDF calculation thread, but it did not help me. It also seems like the constant score query would not help me in this case.

Community
  • 1
  • 1
user1189332
  • 1,773
  • 4
  • 26
  • 46
  • you probably would have to write a custom similarity plugin similar to [this](http://stackoverflow.com/questions/32725263/when-rewriting-multiterm-query-add-constant-score-to-every-term-not-to-the-who) – keety Oct 19 '15 at 19:31
  • I tried this. But the `public float idf(long docFreq, long numDocs) {` never gets called. I have provided the custom class in my index settings both during search and index. – user1189332 Oct 24 '15 at 08:54
  • can you post the `mapping` and index `settings` ? Also which version of elasticsearch ? – keety Oct 24 '15 at 11:39
  • Fragment from my mapping json: `"City_ng": { "type": "string", "analyzer": "n-gram-analyser", "similarity": "my_similarity" }` Fragment from the settings json (Directly at the root level): `"my_similarity": { "index": { "type": "com.concorde.extensions.score.IDFIgnoredSimilarityProvider" }, "search": { "type": "com.concorde.extensions.score.IDFIgnoredSimilarityProvider" } }` **Elasticsearch version is 1.7.3** – user1189332 Oct 25 '15 at 17:47
  • 1
    This solved the problem: https://groups.google.com/forum/#!msg/elasticsearch/TAXsDi8JKbs/vIDVinDzckIJ – user1189332 Oct 27 '15 at 07:49
  • 2
    nice probably you should update the answer with the steps would be helpful to others in future – keety Oct 27 '15 at 13:32
  • Sure yes. :-) I had simply added this property in my elasticsearch.yml. `index.similarity.default.type: com.sai.extensions.score.IDFIgnoredSimilarityProvider `. I didn't need to touch the settings or mapping file. Obviously, in my case, I'm happy with a global similarity algorithm for all the fields. And I had to drop in the jar file (containing my similarity provider and similarity) in lib directory of elasticsearch. – user1189332 Oct 27 '15 at 21:49
  • It is very curious that you need TF on the other hand ... So, for a query "Carl" a person called "Carl Carl" is more relevant for you ? – Alessandro Benedetti Jul 25 '17 at 14:56

1 Answers1

10

When create index, we can put our own similarity calculate method into the setting parts, if you need only disable IDF and use others as the default setting, you can write just a simple script such as:

"script": {"source": "double tf = Math.sqrt(doc.freq); double idf = 1.0; double norm = 1/Math.sqrt(doc.length); return query.boost * tf * idf * norm;"`}

This is shown here.

IKavanagh
  • 6,089
  • 11
  • 42
  • 47
even
  • 101
  • 1
  • 3