I want to do some basic hebrew stemming.
All the examples of custom analyzers I could find always merge other analyzers and and filters but never do any string level processing themselves.
What do I have to do for example if I want to create an analyzer that for each term in the stream it gets, emits either one or two terms by the following rules: if the incoming term begins with anything other then "a" it should be passed as is. if the incoming term begins with "a" then two terms should be emmited: the original term and a second one without the leading "a" and with a lower boost.
So that if the document has "help away" it will return "help", "away", and "way^0.8".
What methods of the analyzer should I override to do this? (A pointer to a similar nature example would be very helpful).
Thanks