I'm using Elasticsearch's approximate k-NN feature.
The problem is that when I repeat a query on the same index, sometimes the nearest neighbor document ID is different. From what I've read about Lucene's implementation, it seems that approximate k-NN uses a deterministic algorithm, so I'm curious why this behavior occurs.
Here's an example query:
GET /my-index/_search
{
"size": 30,
"knn": {
"field": "knnVector",
"k": 30,
"num_candidates": 300,
"query_vector": [
// some 1024 dimensional vector
]
}
}
and the following is the details of my-index
.
GET /_cat/shards/my-index
my-index 0 p STARTED 586611 4.5gb 10.46.32.153 instance-0000000001
my-index 0 r STARTED 586611 4.2gb 10.46.32.91 instance-0000000002
GET /my-index/_mapping
{
"my-index": {
"mappings": {
"dynamic": "false",
"date_detection": false,
"numeric_detection": false,
"properties": {
"knnVector": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "dot_product"
},
}
}
}
}
If anyone knows why this behavior occurs and how to make it deterministic, please let me know.