Occasional variation in nearest neighbor document ID with Elasticsearch's approximate k-NN feature

Question

I'm using Elasticsearch's approximate k-NN feature.

The problem is that when I repeat a query on the same index, sometimes the nearest neighbor document ID is different. From what I've read about Lucene's implementation, it seems that approximate k-NN uses a deterministic algorithm, so I'm curious why this behavior occurs.

Here's an example query:

GET /my-index/_search
{
  "size": 30,
  "knn": {
    "field": "knnVector",
    "k": 30,
    "num_candidates": 300,
    "query_vector": [
      // some 1024 dimensional vector
    ]
  }
}

and the following is the details of my-index.

GET /_cat/shards/my-index
my-index 0 p STARTED 586611 4.5gb 10.46.32.153 instance-0000000001
my-index 0 r STARTED 586611 4.2gb 10.46.32.91  instance-0000000002

GET /my-index/_mapping
{
  "my-index": {
    "mappings": {
      "dynamic": "false",
      "date_detection": false,
      "numeric_detection": false,
      "properties": {
        "knnVector": {
          "type": "dense_vector",
          "dims": 1024,
          "index": true,
          "similarity": "dot_product"
        },
      }
    }
  }
}

If anyone knows why this behavior occurs and how to make it deterministic, please let me know.

Occasional variation in nearest neighbor document ID with Elasticsearch's approximate k-NN feature

0 Answers0