0

My sample index and document structure looks like this :

 http://localhost:9200/testindex/
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "tokenizer": "whitespace",
              "filter": [
                "lowercase",
                "autocomplete"
              ]
            },
            "autocomplete_search": {
              "tokenizer": "whitespace",
              "filter": [
                "lowercase"
              ]
            }
          },
          "filter": {
            "autocomplete": {
              "type": "nGram",
              "min_gram": 2,
              "max_gram": 40
            }
          }
        }
      },
      "mappings": {
        "table1": {
          "properties": {
            "title": {
              "type": "string",
              "index": "not_analyzed"
            },
            "type": {
              "type": "string",
              "index": "not_analyzed"
            },
            "type1": {
              "type": "string",
              "index": "not_analyzed"
            },
            "id": {
              "type": "string",
              "analyzer": "autocomplete",
              "search_analyzer": "autocomplete_search"
            }
          }
        }
      }
    }



http://localhost:9200/testindex/table1/1
{
  "title": "mumbai",
  "type": "efg",
  "type1": "efg1",
  "id": "Caryle management"
}


http://localhost:9200/testindex/table1/2
{
  "title": "canada",
  "type": "abc",
  "type1": "abc1",
  "id": "labson series 2014"
}



http://localhost:9200/testindex/table1/3/
{
  "title": "ny",
  "type": "abc",
  "type1": "abc1",
  "id": "labson series 2012"
}


http://localhost:9200/testindex/table1/4/
{
  "title": "pune",
  "type": "abc",
  "type1": "abc1",
  "id": "hybrid management"
}




Query used to get all documents where type = "abc" and "efg" and have id equal to labson and management .


 {
      "query": {
        "bool": {
          "filter": {
            "query": {
              "terms": {
                "type": [
                  "abc",
                  "efg"
                ]
              }
            }
          },
          "minimum_should_match": 1,
          "should": [
            {
              "query": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "_type": "table1"
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "match": {
                              "id": {
                                "query": "labson ",
                                "operator": "and"
                              }
                            }
                          },
                          {
                            "match": {
                              "id": {
                                "query": "management",
                                "operator": "and"
                              }
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }






    "hits": [
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "2",
    "_score": 1,
    "_source": {
    "title": "canada",
    "type": "abc",
    "type1": "abc1",
    "id": "labson series 2014"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "4",
    "_score": 1,
    "_source": {
    "title": "pune",
    "type": "abc",
    "type1": "abc1",
    "id": "hybrid management"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "1",
    "_score": 1,
    "_source": {
    "title": "mumbai",
    "type": "efg",
    "type1": "efg1",
    "id": "Caryle management"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "3",
    "_score": 1,
    "_source": {
    "title": "ny",
    "type": "abc",
    "type1": "abc1",
    "id": "labson series 2012"
    }
    }
    ]

So i need help for the issues in this output .

  1. Why is labson series 2012 as the last document in the result ?Although my search criteria wants to first look into labson and then management .How can i add boost or weight the labson keyword over management .So the output should give me all documents that matches labson and then management based on the order of input in the match clause .
  2. How can i add a filter at the top which should read has ,give me all documents which has type in ("abc" , "efg") and type1 in ("abc").Right now i am only searching for type in ("abc","efg") ,how can i modify the query to include the IN clause for type1 field.

Please provide some pseudo code for the above 2 query solution as i am new to ES ,that would help me out immensely

Thanks in advance

baiduXiu
  • 167
  • 1
  • 3
  • 15

1 Answers1

1

I want to clear you on this "Although my search criteria wants to first look into labson and then management". Elasticsearch doesn't consider order of query clause while generating a score. Score is generatedd by each sub query clauses independently of the order and then they all are combined to evaluate final score.

please refer following query for your usecase. For score calculation, you can add a boost param in match query options to increase the score of the document in case of match happens.i have used custom score query to ignore tdf/frequency. To ignore query norm effect on socring you can turn querynorm off while indexing the document. Please use the following mappings to turn off querynorm.

 {
        "settings": {
            "analysis": {
                "analyzer": {
                    "autocomplete": {
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase",
                            "autocomplete"
                        ]
                    },
                    "autocomplete_search": {
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase"
                        ]
                    }
                },
                "filter": {
                    "autocomplete": {
                        "type": "nGram",
                        "min_gram": 2,
                        "max_gram": 40
                    }
                }
            }
        },
        "mappings": {
            "table1": {
                "properties": {
                    "title": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "type": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "type1": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "id": {
                        "type": "string",
                        "analyzer": "autocomplete",
                        "search_analyzer": "autocomplete_search",
                        "norms": {
                            "enabled": false
                        }
                    }
                }
            }
        }
    }

Few discussion thread for similar scoring usecases.

Github issue for query norm.

Since you also mentioned you want a filter on top ("abc" , "efg") and type1 in ("abc"). So i added a must filter with two subfilters term and terms to support this.

{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [{
                        "constant_score": {
                            "query": {
                                "match": {
                                    "id": {
                                        "query": "management",
                                        "operator": "and"
                                    }
                                }
                            },
                            "boost": 1
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "match": {
                                    "id": {
                                        "query": "labson",
                                        "operator": "and"
                                    }
                                }
                            },
                            "boost": 2
                        }
                    }],
                    "must": [{
                        "term": {
                            "type1": {
                                "value": "abc"
                            }
                        }
                    }, {
                        "terms": {
                            "type": [
                                "abc",
                                "efg"
                            ]
                        }
                    }]
                }
            }
        }
    }
}

Given your requirement for this filter ("abc" , "efg") and type1 in ("abc"), there is actually no document matching this criteria so hits will come 0 for you in case you are running this query on those mentioned 4 documents. If you want to change and clause to OR clause you can change by making appropriate changes to the query.

Furthur you play more with scoring by adding different boost param to more than one match query and expecting an score evaluated by combining each score of each match query.

Hope this works for you. Thanks

user3775217
  • 4,675
  • 1
  • 22
  • 33
  • So if i have to increase the boost param from top to bottom i should give the first match as a maximum value and then in a descending order till the last match with a minium value of 1 . – baiduXiu Jan 18 '17 at 08:17
  • in the same query if i have to specify type not in ("abc","efg") instead of in ("abc","efg") how do we do that in ES – baiduXiu Jan 18 '17 at 08:19
  • Also what is the reason behind boost of 4 ,i tried 2 and 3 but that didnt work but it worked with 4.So how we decide on boost factor – baiduXiu Jan 18 '17 at 08:26
  • it should work with 2 and 3 as well, i choosed 4 randomly you can comeup with your own scoring algorithm according to your usecase and choose these values. Also these values can be tuned with time. Let me check with 2 and 3 and will get back to you – user3775217 Jan 18 '17 at 08:41