0

I have solr integration for billions of records and search is working, but the need is to enhance the search result output..

We need to have proportion between popular + old and new data And lower the ratio of data from same lot in the output result

For example-

  1. Popularity - 20%
  2. fresh data- taken in the last 6 months 30%)
  3. old data- Older than 6 months 30%
  4. same shoot (repeats) 10%

I have tried to integrate the "boost" but it only returns new data..

product(recip(ms(NOW,active_date),6.33e-11,1,1),log(popularity_score),image_boost_score,collection_boost_score,clusterboost)

Solarconfig.xml- Query handler

 <requestHandler name="/query" class="solr.SearchHandler">
     <lst name="defaults">
     <str name="echoParams">explicit</str>
     <str name="wt">json</str>
     <str name="indent">true</str>
     
      <!-- Query settings -->
      <str name="defType">edismax</str>
      <str name="qf">
         title^50 title_no_stemmer^50 keywords^40 title_stemmer^30 title_porter_stemmer^25 photographer_name^20 photographer_name_no_stemmer keywords_stemmer^4 keywords_porter_stemmer^3 file_name_phrase file_name media_id
      </str>
      
      <str name="boost">
         product(recip(ms(NOW,active_date),6.33e-11,1,1),log(popularity_score),image_boost_score,collection_boost_score,clusterboost)
      </str>
      
      <str name="mm">100%</str>
      <str name="q.alt">*:*</str>
      <str name="rows">20</str>
      <str name="fl">*,score</str>
      
      <!-- Faceting defaults -->
      <str name="facet">on</str>
      <str name="facet.missing">false</str>
      <str name="facet.mincount">1</str>
      <str name="facet.field">media_type</str>
      <str name="facet.field">orientation</str>
      <str name="facet.field">people</str>
      <str name="facet.field">style</str>
      <str name="facet.field">collection_id</str>
      <str name="facet.field">location</str>

   </lst>
   <arr name="last-components">
       <str>elevator</str>
   </arr>
 </requestHandler>```
  • are there any date fields for the documents like createdDate, updatedDate? if not it would be good to add them in order to identify the old ones and new ones – Abhijit Bashetti Apr 05 '22 at 06:52
  • yes there is field 'active_date' – Priyanka Agrawal Apr 05 '22 at 07:59
  • then you can use the active_date field to find the new/latest documents. either by sorting or creating a facet of the date range – Abhijit Bashetti Apr 05 '22 at 08:00
  • Yes I have did that if you check "boost" code and it always showing new data in 1st page.. I need mixing of new and old data – Priyanka Agrawal Apr 05 '22 at 10:07
  • 1
    There is no need to use `product` between each of the terms, this will lead to every term affecting the score dispropriately and is hard to tune. Instead use the `^` syntax for each one and look at the debug output to see how much each field / function affects the score, then start tuning the necessary values to get the priorities you like. – MatsLindh Apr 05 '22 at 12:46
  • @MatsLindh can you share some code snippets or suggestion links – Priyanka Agrawal Apr 05 '22 at 13:08
  • 1
    There are examples of applying multiple boosts in the edismax and dismax reference guide pages: https://solr.apache.org/guide/8_11/the-extended-dismax-query-parser.html https://solr.apache.org/guide/8_11/the-dismax-query-parser.html#bf-boost-functions-parameter - append `&debug=all` to your URL to get debug output for each hit. – MatsLindh Apr 05 '22 at 13:10
  • @MatsLindh I have tried the solution but, it is not showing mix results of old and new data. It is only showing new data of first page – Priyanka Agrawal Apr 06 '22 at 17:11
  • @MatsLindh please help me – Priyanka Agrawal Apr 12 '22 at 17:18
  • There really isn't much more to say; use the debug=all functionality to see how each document gets scored, then tweak your boosts to get the behavior you're looking for (i.e. how much oldness for a document counts, how much the other terms count, etc.) – MatsLindh Apr 12 '22 at 19:58

0 Answers0