I have such schema
schema embeddings {
document embeddings {
field id type int {}
field text_embedding type tensor<double>(d0[960]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
}
}
rank-profile distance {
num-threads-per-search:1
inputs {
query(query_embedding) tensor<double>(d0[960])
}
first-phase {
expression: distance(field, text_embedding)
}
}
}
and such query body:
body = {
'yql': 'select * from embeddings where ({approximate:false, targetHits:10} nearestNeighbor(text_embedding, query_embedding));',
"hits":10,
'input': {
'query(query_embedding)': [...],
},
'ranking': {
'profile': 'distance',
},
}
The thing is the output of this query returns different results depending on targetHits
parameter. For example, the top-1 distance for targetHits: 10
is 2.847000
, and the top-1 distance for targetHits: 200
is 3.028079
.
More of that, if I perform the same query using vespa cli:
vespa query -t http://query "select * from embeddings where ([{\"targetHits\":10}] nearestNeighbor(text_embedding, query_embedding));" \
"approximate=false" \
"ranking.profile=distance" \
"ranking.features.query(query_embedding)=[...]"
I'm receiving the third result:
{
"root": {
"id": "toplevel",
"relevance": 1.0,
"fields": {
"totalCount": 10
},
"coverage": {
"coverage": 100,
"documents": 1000000,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:embeddings:embeddings::926288",
"relevance": 0.8158006540357854,
...
where as we can see top-1 distance is 0.8158
So, how can I perform the exact and not approximate nearest neighbors search, which results do not depend on any parameters?