My problem is how to parse wildcard queries with Lucene that the query term is passed through a TokenFilter
.
I'm using a a custom Analyzer
with several filers (e.g. ASCIIFoldingFilter
, but that's only an example). My problem is that whenever Lucene's QueryParser
detects that one of the sub-queries is a WildcardQuery
, it by design [1] ignores the Analyzer
.
This means that a query for über is filtered correctly,
über -> uber
but a query for über* (with a wildcard) is not passed through a filter at all:
über* -> über*
Obviously this means - as index-side all tokens are filtered - that there can be no matches on any query containing ü...
Q: How do I force Lucene to filter the query for the WildCard queries, too? I'm looking for a way which would at least marginally re-use Lucene's codebase ;-)
Note: As an input I receive a query string, so building queries programmatically is not an option. Note: I'm using Lucene 4.5.1.
[1] http://www.gossamer-threads.com/lists/lucene/java-user/14224
Context:
// analyzer applies filters in Analyzer#createComponents (String, Reader)
Analyzer analyzer = new CustomAnalyzer (Version.LUCENE_45);
// I'm using org.apache.lucene.queryparser.classic.MultiFieldQueryParser
QueryParser parser = new MultiFieldQueryParser (Version.LUCENE_45, fields, analyzer);
parser.setAllowLeadingWildcard (true);
parser.setMultiTermRewriteMethod (MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
// actual parsing of the input query
Query query = parser.parse (input);