We are currently building a search tool based on elasticsearch and our query involves matching nearest value to user input values. Say user inputs [1,10,100,1000,10000] it should return closest value available in our index to each of those elements in the array.
Right now we are using the following query to retrieve values one at a time and we are passing user input array via loop and its really slow.
{
"query": {
"term": {"CHR": "chr1"}
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"params": {
"factor": 10000
},
"inline": "def cur = 0; cur = (params.factor - doc['START'].value); if (cur < 0) { cur = cur * -1 } else { cur = cur}" },
"order" : "asc"
}
}
}
Our requirement is that the factor would take an array of integers rather than a single value and gives our the first closest value that it finds in our index.
The complete python function is posted below (Python)
def gene_peek(coordinate, chr):
peek_liver = []
for i in range(0,len(coordinate)):
a = int(coordinate[i])
res = requests.post("http://localhost:9200/lab/peek_liver/_search?pretty=true&scroll=10m&size=1", json={
"query": {
"term": {"CHR": chr[i]}
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"params": {
"factor": a
},
"inline": "def cur = 0; cur = (params.factor - doc['START'].value); if (cur < 0) { cur = cur * -1 } else { cur = cur}" },
"order" : "asc"
}
}
})
data = res.json()
peek_liver.append(data["hits"]["hits"][0]["_source"])
return peek_liver
Any help would be greatly appreciated. Thanks.