Lucene Search based on edit-distance on entire text rather than individual tokens

Question

I am using SpanNearQuery with SpanMultiTermQueryWrapper to match my query text with an edit-distance of either 1 or 2 containing more than a word with the documents each containing multiple tokens

Here I need to specify the edit distance for each individual token in the query text which works pretty well!

However, Is there a way to search the document based on the edit-distance of either 1 or 2 on the entire query-text rather than specifying for each individual tokens?

For example, this is the current setup: (Not the exact query-syntax, just for simplicity)

For query "bread basket" - "bread~2 : basket~2", but I am expecting something like "bread basket~2".

Indexing Method: I am using StandardAnalyzer to index my multi-termed documents

Basically, I am looking to do word segmentation. If the input-query is "the breadbasket", it should match with the document "the bread basket". Let me know if there exists any hack to achieve this.

Any help would be appreciated. Thanks in advance!

Edit distance ([fuzzy searching](https://lucene.apache.org/core/9_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Fuzzy_Searches)) only applies to single word terms (as you probably already know) and not to multi-word phrases. Suggestion: for situations like the one mentioned at the end of your question (find `bread basket` by searching for `breadbasket`) you can try using ngrams (see [here](https://stackoverflow.com/a/75311734/12567365)). — andrewJames, Mar 28 '23 at 13:29

Lucene Search based on edit-distance on entire text rather than individual tokens

0 Answers0