1

I have implemented the function score attribute in my document model which contains a click field that keeps tracks of a number of view per document. Now I want the search results to get more priority and appear at the top based on the clicks per search

My document.rb code

require 'elasticsearch/model'



 def self.search(query)
  __elasticsearch__.search(
    {
      query: {
        function_score: {
          query: {
            multi_match: {
              query: query,
              fields: ['name', 'service'],
              fuzziness: "AUTO"
            }
          },
          field_value_factor: {
            field: 'clicks',
            modifier: 'log1p',
            factor: 2 
          }
        }
      }
    }
  )
 end

 settings index: { "number_of_shards": 1, 
  analysis: {
    analyzer: {
      edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter: 
                       ["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
        }
    },
    filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
                             }, 
              edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
                              "20" } 
  }
 } do
  mapping do
    indexes :name, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
    indexes :service, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
  end 
 end

end

Search View is here

<h1>Document Search</h1>

 <%= form_for search_path, method: :get do |f| %>
 <p>
  <%= f.label "Search for" %>
  <%= text_field_tag :query, params[:query] %>
  <%= submit_tag "Go", name: nil %>
 </p>
<% end %>
<% if @documents %>
  <ul class="search_results">
    <% @documents.each do |document| %>
    <li>
       <h3>
          <%= link_to document.name, controller: "documents", action: "show", 
         id: document._id %>   
       </h3>   
   </li>
   <% end %>
 </ul>
<% else %>
 <p>Your search did not match any documents.</p>
<% end %>
 <br/>

When I search for Estamp, I get the results follow in the following order:

 Franking and Estamp # clicks 5
 Notary and Estamp   #clicks 8

So clearly when the Notary and Estamp had more clicks it does not come to the top of the search.How can I achieve this?

This is what I get when I run it on the console.

POST _search

      "hits": {
       "total": 2,
       "max_score": 1.322861,
       "hits": [
             {
              "_index": "documents",
              "_type": "document",
              "_id": "13",
              "_score": 1.322861,
              "_source": {
                 "id": 13,
                 "name": "Franking and Estamp",
                 "service": "Estamp",
                 "user_id": 1,         
                 "clicks": 7
              },
           {
              "_index": "documents",
              "_type": "document",
              "_id": "14",
              "_score": 0.29015404,
              "_source": {
                "id": 14,
                "name": "Notary and Estamp",
                "service": "Notary",
                "user_id": 1,
                "clicks": 12
         }
       }
     ]

Here the score of the documents is not getting updated based on the clicks

Mahesh Mesta
  • 169
  • 7

1 Answers1

2

Without seeing your indexed data it's not easy to answer. But looking at the query one thing comes to my mind, I'll show it with short example:

Example 1:

I've indexed following documents:

{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}

Running the same query you provided gave this result:

"hits": {
    "total": 2,
    "max_score": 4.333119,
    "hits": [
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwkems7jEvHyvnccV",
            "_score": 4.333119,
            "_source": {
                "name": "Notary and Estamp",
                "service": "text",
                "clicks": 8
            }
        },
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwo6ds7jEvHyvnccW",
            "_score": 3.6673431,
            "_source": {
                "name": "Franking and Estampy",
                "service": "text",
                "clicks": 5
            }
        }
    ]
}

So everything is fine - document with 8 clicks got higher scoring (_score field value) and the order is correct.

Example 2:

I noticed in your query that name field is boosted with high factor. So what would happen if I had following data indexed?

{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}

And result:

"hits": {
    "total": 2,
    "max_score": 13.647502,
    "hits": [
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwo6ds7jEvHyvnccW",
            "_score": 13.647502,
            "_source": {
                "name": "Franking and Estampy",
                "service": "text",
                "clicks": 5
            }
        },
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwkems7jEvHyvnccV",
            "_score": 1.5597181,
            "_source": {
                "name": "text",
                "service": "Notary and Estamp",
                "clicks": 8
            }
        }
    ]
}

Although Franking and Estampy has only 5 clicks, it has much much higher scoring than the second document with greater number of clicks.

So the point is that in your query, the number of clicks is not the only factor that has an impact on scoring and final order of documents. Without the real data it's only a guess from my side. You can run the query yourself with some REST client and check scoring/field/matching phrases.

Update

Based on your search result - you can see that document with id=13 has Estamp term in both fields (name and service). That is the reason why this document got higer scoring (it means that in the algorithm of calculating scoring it is more important to have the term in both fields than have higher number of clicks). If you want clicks field to have bigger impact on the scoring, try to experiment with factor (probably should be higher) and modifier ("modifier": "square" could work in your case). You can check possible values here.

Try for example this combination:

{
  "query": {
    "function_score": { 
      ... // same as before
      },
      "field_value_factor": { 
        "field": "clicks" ,
        "modifier": "square",
        "factor": 3 
      }
    }
  }
}

Update 2 - scoring based only on number of clicks

If the only parameter that should have an impact on scoring should be the value in clicks field, you can try to use "boost_mode": "replace" - in this case only function score is used, the query score is ignored. So the frequency of Estamp term in name and service fields will have no impact on the scoring. Try this query:

{
  "query": {
    "function_score": { 
      "query": { 
        "multi_match": {
          "query":    "Estamp",
          "fields": [ "name", "service"],
          "fuzziness": "AUTO"
        }
      },
      "field_value_factor": { 
        "field": "clicks",
        "factor": 1
      },
      "boost_mode": "replace"
    }
  }
}

It gave me:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 5,
        "hits": [
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2nI0HkJPYn0YKQxRvd",
                "_score": 5,
                "_source": {
                    "name": "Notary and Estamp",
                    "service": "Notary",
                    "clicks": 5
                }
            },
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2nIwKvJPYn0YKQxRvc",
                "_score": 4,
                "_source": {
                    "name": "Franking and Estamp",
                    "service": "Estamp",
                    "clicks": 4
                }
            }
        ]
    }
}

This may be the one you are looking for (note the values "_score": 5 and "_score": 4 are matching the number of clicks).

Joanna Mamczynska
  • 2,148
  • 16
  • 14
  • I m rather confused on how exactly am I suppose to run the query with REST client for scoring since I am rather new to Elastic Search. Plus what additional data would need to interpret the factors impacting the score of the document. For example, you mentioned the indexed data, how can I get the indexed data? – Mahesh Mesta Aug 03 '17 at 06:21
  • I have added the indexed data in the question with the scores. As you see there, the score for Notary and Estamp is less even though its clicks are higher than Franking and Estamp – Mahesh Mesta Aug 03 '17 at 07:15
  • Thanks for the search result, that's it and it explains a lot. I updated my answer, hope it helps. – Joanna Mamczynska Aug 03 '17 at 07:31
  • Will it also consider the unique id in the table for comparison. – Mahesh Mesta Aug 03 '17 at 09:17
  • If franking and estamp have 3 clicks and notary and estamp has 5 clicks, the latter appears at the top.However, if franking and estamp becomes 4 clicks now and even though notary and estamp has 5 clicks, it will appear at the top.So I am confused why it will still give it more priority ? Ive changed the service to estamp for both – Mahesh Mesta Aug 03 '17 at 09:24
  • I've updated the answer again - it should be the right solution if the only factor that should matter for scoring is the number of clicks. – Joanna Mamczynska Aug 03 '17 at 09:54