Solr Tokenizer Injection

Question

As an example I have a text field that might contain the following string:

"d7199^^==^^81^^==^^A sentence or two!!"

I want to tokenize this data but have each token contain the first part of the string. So, I'd like the tokens to look like this for the example above:

"d7199^^==^^81^^==^^a"

"d7199^^==^^81^^==^^sentence"

"d7199^^==^^81^^==^^or"

"d7199^^==^^81^^==^^two"

How would I go about doing this?

score 1 · Answer 1 · answered Aug 31 '11 at 13:00

1

You can implement your own custom Tokenizer and add it to the Solr classpath. Then use it in your Solr schema.xml and solrconfig.xml

answered Aug 31 '11 at 13:00

Karl-Bjørnar Øie

5,554
1
24
30

After a bit of research this was my most logical conclusion as well. If you can give me some good examples the bounty all be yers! – Jason Palmer Sep 01 '11 at 13:19
How do you know when the first part of the input reaches its end? – jpountz Sep 02 '11 at 15:50
I could either define a different separator or we could just have it end at the last token ^^==^^. Or something else if you have a better suggestion. 3 more days until the bounty expires :( – Jason Palmer Sep 05 '11 at 16:08
1

It seems obvious that one must subclass a Tokenizer but HOW? – gyozo kudor May 25 '12 at 12:22
start with extending http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/TokenizerFactory.html or http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/TokenFilterFactory.html http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters – Karl-Bjørnar Øie Sep 03 '12 at 15:50

Solr Tokenizer Injection

1 Answers1