how can I improve recall for this condition ?any suggestion? I want to create an index with 39 million passages each one containing at least four sentences in English. My queries are short and interrogative sentences. I know that a language model with Dirichlet smoothing, stop word removal and stemmer is best for this condition. how can I index with these conditions (I've indexed with this configs but there is no difference in results with default bm25)
My index:
{
"settings": {
"index":{
"similarity" : {
"my_similarity" : {
"type" : "LMDirichlet",
"mu" : 2000
}
},
"analysis":{
"filter":{
"english_stop":{
"type":"stop",
"stopwords":"_english_"
},
"my_stemmer":{
"type":"stemmer",
"name":"english"
}
},
"analyzer":{
"my_custom_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase",
"english_stop",
"my_stemmer"
]
}
}
}
},
"number_of_shards": 1
},
"mappings": {
"properties": {
"content": {
"similarity" : "my_similarity" ,
"analyzer": "my_custom_analyzer",
"type": "text"
}
}
}
}
and for searching my python code is:
query = " (" + prevTurn + ")^1 (" + currentTurn + ")^2"
search_param={
"query": {
"query_string": {
"query":query,
"analyzer": "my_stop_analyzer",
"default_field":"doc.content"
}
}
}
one sample turn:
Title: The Neolithic Revolution
Description: The neolithic revolution and technology used within it and when it emerged in the british isles. Also, the transition to the bronze age and its significance.
1 What was the neolithic revolution?
2 When did it start and end?
3 Why did it start?
4 What did the neolithic invent?
5 What tools were used?
6 When was it brought to the british isles?