Using a positional inverted index structure, for example
var index = new Dictionary<string, Dictionary<int, List<int>>>()
{
["bar"] = new Dictionary<int, List<int>>()
{
[3] = new List<int>() { 33, 45, 182 },
[18] = new List<int>() { 611, 794 },
...
},
["foo"] = new Dictionary<int, List<int>>()
{
...
}
which has a { term: { docno: [...positions] } }
structure, how can I perform fuzzy lookups for phrase queries?
ElasticSearch and Lucene both have Levenshtein edit distance support but seems to be on a character level, gppgle
matches google
if the fuzziness
parameter is 2 (for edit distance of 2).
However I want to match on a word level, ten people
should match ten in people
, one two three
should match one and three
(depending on the "fuzziness" of the search).
I'm not sure how to implement this efficiently considering I have an index at my disposal.
Phrase queries can be implemented by simply checking if for each word's position in a document, the next word of the query appears in the same document one word further along.
Proximity queries are implemented in the same way as phrase query but allowing for the next word of the query to appear with some distance of the previous word. All terms have to exist in the document in order to match.
How can I implement an "Edit distance" query?