Find best Ready-made Analyzer/Tokenizer/TokenFilters for Solr that divide input string

Question

I am moving some existing Index from Lucene to Solr. We apply the following Logic on the Input text in Lucene:

to lower case
replaceDictionaryWords (replace some specific words by other words example replace "hertz" by "htz")
extract characters and digits only
trim output string
replace \s+ by \s
split using java.lang.String#split(in) method
for each splitted text, divide the result word by the following pattern: "ABCDEF" => ABC BCD CDE DEF (divide on 3, 2)

I don't want to write Tokenizer that might be exist.

score 1 · Answer 1 · answered May 14 '12 at 19:36

But if you already have an existing Lucene analyzer, you can make Solr use it.

score 0 · Answer 2 · answered May 14 '12 at 16:24

0

Try OpenPipeline. It's designed for preprocessing documents that get fed to search software.

answered May 14 '12 at 16:24

ccleve

2 Answers2