Write your own custom analyzer class similar to SynonymAnalyzer
in Lucene.Net – Custom Synonym Analyzer. Your override of TokenStream
could solve this by pipelining the stream using WhitespaceTokenizer
and LowerCaseFilter
.
Remember that your indexer and searcher need to use the same analyzer.
Update: Handling multiple comma-delimited keywords
If you only need to handle unspaced comma-delimited keywords for searching, not indexing then you could convert the search expression expr
as below.
expr = expr.Replace(',', ' ');
Then pass expr
to the QueryParser
. If you want to support other delimiters like ';' you could do it like this:
var terms = expr.Split(new char[] { ',', ';'} );
expr = String.Join(" ", terms);
But you also need to check for a phrase expression like "sybase,c#,.net,oracle" (expression includes the quote " chars) which should not be converted (the user is looking for an exact match):
expr = expr.Trim();
if (!(expr.StartsWith("\"") && expr.EndsWith("\"")))
{
expr = expr.Replace(',', ' ');
}
The expression might include both a phrase and some keywords, like this:
"sybase,c#,.net,oracle" server,c#,.net,sybase
Then you need to parse and translate the search expression to this:
"sybase,c#,.net,oracle" server c# .net sybase
If you also need to handle unspaced comma-delimited keywords for indexing then you need to parse the text for unspaced comma-delimited keywords and store them in a distinct field eg. Keywords
(which must be associated with your custom analyzer). Then your search handler needs to convert a search expression like this:
server,c#,.net,sybase
to this:
Keywords:server Keywords:c# Keywords:.net, Keywords:sybase
or more simply:
Keywords:(server, c#, .net, sybase)