Using ElasticSearch I'm trying to use the minimum_should_match
option on a Terms Query
to find documents that have a list of long
s that is X%
similar to the list of long
s I'm querying with.
e.g:
{
"filter": {
"fquery": {
"query": {
"terms": {
"mynum": [1, 2, 3, 4, 5, 6, 7, 8, 9, 13],
"minimum_should_match": "90%",
"disable_coord": False
}
}
}
}
}
will match two documents with a mynum
list of:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
This works and is correct since the first document has a 10
at the end while the query contained a 13
and the second document contained an 11
where again the query contained a 13
.
Which means that 1 ou of 10 numbers in my query's list is different in the returned document and amounts to the allowed 90%
similarity (minimum_should_match
) value in the query.
Now the issue that I have is that I would like the behaviour to be different in the sense that since the second document is longer and has 11 numbers in place of 10, the difference level should ideally have been higher since it has actually two values 11
and 12
that are not in the query's list. e.g:
Instead of computing the intersection of:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
with:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
which is a 10%
difference
it should say that since list2
is longer than list1
, the intersection should be:
(list2) [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
with:
(list1) [1, 2, 3, 4, 5, 6, 7, 8, 9, 13]
which is a 12%
difference
- Is this possible ?
- If not, how could I weight in the length of the list besides using a dense vector rather than a sparse one ? e.g:
using
[1, 2, 3, 4, 5, 6, 7, 8, 9, , , , 13]
rather than:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 13]