8

I want to change the scoring system in elasticsearch to get rid of counting multiple appearances of a term. For example, I want:

"texas texas texas"

and

"texas"

to come out as the same score. I had found this mapping that elasticsearch said would disable term frequency counting but my searches do not come out as the same score:

"mappings":{
"business": {   
   "properties" : {
       "name" : {
          "type" : "string",
          "index_options" : "docs",
          "norms" : { "enabled": false}}
        }
    }
}

}

Any help will be appreciated, I have not been able to find a lot of information on this.

I am adding my search code and what gets returned when I use explain.

My search code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build();
    Client client = new TransportClient(settings)
    .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));

    SearchRequest request =  Requests.searchRequest("businesses")
            .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("name", "Texas")
            .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);
    
    ExplainRequest request2 = client.prepareIndex("businesses", "business")

and when I search with explain I get:

  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5KBks4zEorv9YI4n",
      "_score" : 1.0,
      "_source":{
"name" : "texas"
}
,
      "_explanation" : {
        "value" : 1.0,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.0,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5K6Ks4zEorv9YI4o",
      "_score" : 0.8660254,
      "_source":{
"name" : "texas texas texas"
}
,
      "_explanation" : {
        "value" : 0.8660254,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 0.8660254,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }
    

It looks like it is still considering frequency and doc frequency. Any ideas? Sorry for the bad formatting I don't know why it is appearing so grotesque.

My code from the browser search http://localhost:9200/businesses/business/_search?pretty=true&qname=texas is:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YcCKjKvtg8NgyozGK",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas texas" }
}
    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.0,
      "_source":{
"name" : "texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.0,
      "_source":{
"name" : "texas texas texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9Yb7NgKvtg8NgyozFf",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas" }
}
    } ]
  }
}

It finds all 4 objects I have in there and has them all the same score. When I run my java API search with explain I get:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.287682,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.287682,
      "_source":{
"name" : "texas" }
,
      "_explanation" : {
        "value" : 1.287682,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.287682,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.1151654,
      "_source":{
"name" : "texas texas texas" }
,
      "_explanation" : {
        "value" : 1.1151654,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.1151654,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }
}
Chenmunka
  • 685
  • 4
  • 21
  • 25
Chadvador
  • 167
  • 2
  • 9
  • the mismatch is probably got more to do with `doc frequency` rather than `term frequency` are you using [search_type=dfs_query_then_fetch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#query-then-fetch?q=query_then_fech) . If that doesn't help try setting `explain=true` in the query to see the breakdown in scoring – keety Aug 25 '15 at 03:47
  • I switched it to dfs_query_then_fetch but that didn't work. I will post my code and explain results in a second – Chadvador Aug 25 '15 at 14:07
  • could you post the query too ? – keety Aug 25 '15 at 14:16
  • I'm sorry, what do you mean? I just execute the SearchRequest from above with: ActionFuture af = client.search(request); – Chadvador Aug 25 '15 at 14:20
  • And thank you for the formatting edit! – Chadvador Aug 25 '15 at 14:21
  • oh my bad did not realise the query is in the code snippet could you print the actual query dsl the code generates ,`explain` seems to suggest the query is against the `_all` field – keety Aug 25 '15 at 14:26

2 Answers2

5

Looks like one cannot override the index options for a field after the field has been initial set in mapping

Example:

put test
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
           "index_options": "freqs",
            "norms": {
               "enabled": false
            }
         }
      }

}
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
            "index_options": "docs",
            "norms": {
               "enabled": false
            }
         }
      }

}
get  test/business/_mapping

   {
   "test": {
      "mappings": {
         "business": {
            "properties": {
               "name": {
                  "type": "string",
                  "norms": {
                     "enabled": false
                  },
                  "index_options": "freqs"
               }
            }
         }
      }
   }
}

You would have to recreate the index to pick up the new mapping

keety
  • 17,231
  • 4
  • 51
  • 56
  • Well this is embarrasing, that was my own stupidity, I was testing just using my browser with the command: http://localhost:9200/businesses/_search?pretty=true&explain=true&q=texas, after I change it to "qname=texas" it works, the scores are the same. So why doesn't it work with my java API search, where it seems like I am searching the name field? – Chadvador Aug 25 '15 at 14:34
  • could you paste the whole snippet or better the response with explain set in java client – keety Aug 25 '15 at 15:39
  • I'm sorry I am not sure how to set it in javaAPI, it doesn't seem to be an option with SearchRequest. I will update my OP with the code. – Chadvador Aug 25 '15 at 16:18
  • I changed to SearchResponse to be able to use explain, updating OP again and overwriting from previous edit. It looks like when i'm using the java API its not hitting the settings that should ignore the frequencies. – Chadvador Aug 25 '15 at 16:36
  • strange could you try this `http://localhost:9200/businesses/business/_search?pretty=true&q=name:texas&search_type=dfs_query_then_fetch&explain=true` in browser and see if you still get the same score ? I have a feeling probably the mapping wasn't applied or was applied post indexing the documents – keety Aug 25 '15 at 17:42
  • That new search gives me the same results as my java api. And regarding the mappings, why it be working for one search but not the other when it is on the same documents? I set the mapping before indexing anything. – Chadvador Aug 25 '15 at 17:57
  • the previous `http://localhost:9200/businesses/business/_search?pretty=true&qname=texas` has wrong syntax and elasticsearch unfortunately instead of throwing an error ignores the wrong url params` . It defaults to `match all ` .This is the reason all documents have the same score. You can try with `http://localhost:9200/businesses/business/_search?pretty=true&qname=thiscannotbeinthedocument` and you should get the same as previous result . it looks very likely the mapping wasn't applied correctly try `http://localhost:9200/businesses/business/_mapping` – keety Aug 25 '15 at 18:08
  • Wow you're right on all counts it looks like... same results, and the current mapping is not what I put in, it looks like the default assignment that elasticsearch gives. When I am submitting the mapping it gives me an all good response, I don't remember what it is exactly but its something like acknowledged: true. Maybe I am putting it in the wrong place? – Chadvador Aug 25 '15 at 18:21
  • You are on to something , updated answer actually looks like once the index has been created and field specified int he mapping you cannot override it with mapping call . Don't think it is mentioned in the documents though so probably you can raise an issue with [elasticsearch](https://github.com/elastic/elasticsearch/issues?utf8=%E2%9C%93&q=index_options) since it should atleast raise an error rather than silently fail – keety Aug 25 '15 at 18:43
  • I am just using it on a test elasticsearch right now, so I am deleting the index, adding the mapping to "businesses" and then adding little test objects. Is there something different I can be doing when adding the mapping initially? – Chadvador Aug 25 '15 at 18:50
  • You were right, I was using the wrong way to map it. I'll update my post above with my working mapping, thank you so much!! – Chadvador Aug 25 '15 at 19:18
  • Is there a way to add "index_options" : "freqs" to all fields, not just the "name" field? I'm looking for something like "*" instead of "name" – user3071643 Oct 03 '16 at 20:49
  • should be able to achieve it using [dynamic templates](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#match-mapping-type) – keety Oct 03 '16 at 21:08
0

your field type must be text

you must re-indexing elasticsearch - create a new index

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "index_options": "docs"
      }
    }
  }

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-options.html

Milija B
  • 11
  • 1
  • 1