0

how can I improve recall for this condition ?any suggestion? I want to create an index with 39 million passages each one containing at least four sentences in English. My queries are short and interrogative sentences. I know that a language model with Dirichlet smoothing, stop word removal and stemmer is best for this condition. how can I index with these conditions (I've indexed with this configs but there is no difference in results with default bm25)

My index:

{
"settings": {
"index":{
            "similarity" : {
          "my_similarity" : {
            "type" : "LMDirichlet",
            "mu" : 2000
          }
        },
  "analysis":{
    "filter":{
      "english_stop":{
        "type":"stop",
        "stopwords":"_english_"
      },
      "my_stemmer":{
        "type":"stemmer",
        "name":"english"
      }
    },
    "analyzer":{
      "my_custom_analyzer":{
        "type":"custom",
        "tokenizer":"standard",
        "filter":[
          "lowercase",
          "english_stop",
          "my_stemmer"
          ]
      }
    }
  }
},
    "number_of_shards": 1
},
"mappings": {
    "properties": {
        "content": {
        "similarity" : "my_similarity" ,
        "analyzer": "my_custom_analyzer",
            "type": "text"
        }
    }
}
}

and for searching my python code is:

query = " (" + prevTurn + ")^1 (" + currentTurn + ")^2"

search_param={
"query": {
"query_string": {
"query":query,
"analyzer": "my_stop_analyzer",
"default_field":"doc.content"
}
}
}

one sample turn:

Title: The Neolithic Revolution
Description: The neolithic revolution and technology used within it and when it emerged in the british isles.  Also, the transition to the bronze age and its significance.
1   What was the neolithic revolution?
2   When did it start and end?
3   Why did it start?
4   What did the neolithic invent?
5   What tools were used?
6   When was it brought to the british isles?
Omk
  • 1
  • 2
  • 16

1 Answers1

1

you can try similarity in query