0

I'm using ES 7.3. I have a product index with titles, part, model numbers, etc. I am having issues with keyword stuffing because some of the products might contain the same word, part or model number multiple times throughout the document.

For example, the model number may be present twice in the title and also in the model number field. Some products may only include the model number in the title and not in the model number fields. These products have difficulty ranking due to the issue. How can I prevent this type of keyword stuffing? Here is my code.

Fields:

fields = [
          'name^10','name.ngram',
          'part_number^10',


          'mod_name^5', 

          'model_number^5', 

          'brand^10',
          'category^5',
          'product_type^5',
          'search_variations^1'                  
         ]  

fuzzy_fields = [
         'name',
          'part_number',
          'mod_name', 
          'model_number', 
          'brand',
          'category',
          'product_type',
          'search_variations'                  
         ]

Query:

{
         explain: true,
         query:{
           function_score: { 
              "query": {
                "bool": {
                  "should": 
                    [{
                      multi_match:{
                         fields: fields,
                         type: "most_fields", 
                         query: "#{query}"
                       }
                    },
                    {
                      multi_match:{
                         fields: fuzzy_fields,
                         type: "most_fields", 
                         fuzziness: "AUTO",
                         query: "#{query}"
                       }
                    }],
                  "filter": {
                    "bool": { 
                      "must": filters
                    }
                  }
               }
             },field_value_factor:{
                    field: "popularity",
                    modifier: "log1p",
                    factor: 5

                 },
                 boost_mode: "sum"
             }
       },highlight: {
            fields: {
              :"*" => {}
            }
          },
        aggs: {categories: { terms: { field: "category.raw"} }} 

      }

UPDATE

Adding the unique filter to my mapping does prevent duplicate values matching on the same field but does not across multiple fields which is what I need.

Cannon Moyer
  • 3,014
  • 3
  • 31
  • 75

1 Answers1

0

I think what you want in your query is the option best_fields on the multi_match query:

best_fields type uses the score of the single best matching field

So only the best field counts and if some documents have the match in multiple fields that won't skew the score.

xeraa
  • 10,456
  • 3
  • 33
  • 66