I want to build an application where a match requires each token in the document to be contained in the query at least once!!!
Please note its the other way around than the standard expectation. So documents are now fairly small while queries can be very long. Example:
Document:
"elastic super cool".
A valid query match would be
"I like elastic things since elasticsearch is super cool"
I managed to get the number of matched tokens from elastic search (see also https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/ttJTE52hXf8). So in above example 3 matches (= length of document) would mean the query matches.
But how can I combine this with synonyms???
Suppose a synonym for "cool" would be "nice", "great" and "good". By using a synonym token filter, I managed to add the synonoms to each position in the document.
Hence, the following four documents each have 3 token matches for the query above:
"elastic super nice"
"elastic nice cool"
"nice good great"
"good great cool"
But only the first match is a valid match!
How can I avoid that each synonym match counts as one match although they represent the same token in the document?
Any ideas how to tackle this problem?
I read that percolators might address this issue, but I am still not sure whether perculators would work with synonyms the way I want it...
Ideas?