-1

I have documents like this:

{
title:'...',
body: '...'
}

I want to get documents which are more than 90% similar to the with a specific document. I have used this query:

query = {
    "query": {
        "more_like_this" : {
            "fields" : ["title", "body"],
            "like" : "body of another document",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

How to change this query to check for 90% similarity with specified doc?

ehsan shirzadi
  • 4,709
  • 16
  • 69
  • 112
  • Your question sounds pretty much exactly like an example in the docs..: "`A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the Multi GET API.`". Link: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html – ryanlutgen Dec 30 '17 at 12:11

2 Answers2

0

Take a look at the Query Formation Parameter minimum_should_match

Frank
  • 308
  • 1
  • 9
0

You should specify minimun_should_match

minimum_should_match

After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the minimum should match. (Defaults to "30%").

It form query using this

The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with the highest tf-idf to form a disjunctive query of these terms

So if you would like to boost you title field you should boost your title field because if the title contains most of the terms present in the term frequency/ Inverse document frequency. the result should be boosted because it has more relevance. You can boost your title field by 1.5.

Refer this document for referenceren on the more_like_this query

Rahul Sharma
  • 1,393
  • 10
  • 19