can anyone point me in the right direction for implementing a Lucene Tokenizer with LookAhead?
I'm using a snowball stemmer and I want to be able to get phrases of city names and prevent them from being stemmed, so that "Los Angeles" will be set as a single token, as opposed to two tokens of "Los" and "Angeles".
I also need to keep tokens that don't match any city name as a single word.
any ideas?
TIA