0

I am upgrading my Elasticsearch server from version 1.6.0 to 7.12.1, which made me rewrite every query I had.

Those queries retrieves materials identified by 3 field : nature.idCat, nature.idNat and marque.idMrq (category ID, nature ID and brand ID).

I have a searching field on my application to search for specific materials, so if the user enter "photoc", the query sent to my Elasticsearch server looks like this :

{
    "sort": [
        "_score"
    ],
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "default_field": "search",
                        "query": "*photoc*",
                        "boost": 10
                    }
                },
                [...] // Some more irrelevant conditions for this question like 
                      // if nature.idCat = 26 then idNat must be in some range and idMrq in some other range
            ]
        }
    }
}

And 2 examples of "hits" results of this query :

"hits": [
    {
      "_index": "ref_biens",
      "_type": "_doc",
      "_id": "T3RrpXsBz_TibRxz0akC",
      "_score": 13.0,
      "_source": {
        "search": "Photocopieur GENERIQUE",
        "nature": {
          "idCat": 26,
          "idNat": 665,
          "libelle": "Photocopieur",
          "ekip": "U03C",
          "codeINSEE": 300121,
          "noteMaterielArrondi": 5
        },
        "marque": {
          "idMrq": 16,
          "libelle": "GENERIQUE",
          "ekip": "Z999",
          "idVRDuree": 808
        }
      }
    },
    {
      "_index": "ref_biens",
      "_type": "_doc",
      "_id": "UHRrpXsBz_TibRxz0akC",
      "_score": 13.0,
      "_source": {
        "search": "Photocopieur INFOTEC",
        "nature": {
        "idCat": 26,
        "idNat": 665,
        "libelle": "Photocopieur",
        "ekip": "U03C",
        "codeINSEE": 300121,
        "noteMaterielArrondi": 5
      },
      "marque": {
        "idMrq": 1244,
        "libelle": "INFOTEC",
        "ekip": "I091",
        "idVRDuree": 808
      }
    }
  }
]

This works perfectly !

My problem appears when the user types more than one word, for example if he is searching specifically for the "Photocopieur PANASONIC", the results of the query shows the right material as the first result with a _score of 23 but then every other match has the same _score of 13 which can bring some totally different material as the next results (matching only on the brand name for example) even though I whish for other "Photocopieur" to be displayed first.

The way I'm thinking of doing it is by adding "score points" to results that have the most similarities to the best match, for instance I would add a 6 point boost for the same nature.idCat, 4 points for the same nature.idNat and finally 2 points for the same marque.idMrq.

Any idea on how I can achieve that ? Is this the correct approach to my problem ?

Natty
  • 497
  • 1
  • 11
  • 23
  • you can add a sort on additional fields "sort": [ { "_score": { "order": "desc" } }, { "nature.idCat": { "order": "desc" } } ]. This will sort first by score then by idCat – jaspreet chahal Sep 04 '21 at 14:50
  • @jaspreetchahal That's what I'm doing right now, but the issue is that the first result is a `nature.idCat: 26` (which are the "Photocopieur") and the next results are ordered by `nature.idCat` descending. I need to get every `nature.idCat: 26` before any other – Natty Sep 16 '21 at 07:58

0 Answers0