Lucene Tokenizer with LookAhead

Question

can anyone point me in the right direction for implementing a Lucene Tokenizer with LookAhead?

I'm using a snowball stemmer and I want to be able to get phrases of city names and prevent them from being stemmed, so that "Los Angeles" will be set as a single token, as opposed to two tokens of "Los" and "Angeles".

I also need to keep tokens that don't match any city name as a single word.

any ideas?

TIA

score 1 · Accepted Answer · answered Sep 30 '11 at 14:39

1

Here is a gist of something I wrote which does what you want.

answered Sep 30 '11 at 14:39

Xodarap

11,581
11
56
94

**excellent** I ported it to Java and it works like a charm! thank you :) – isapir Oct 01 '11 at 18:52

Lucene Tokenizer with LookAhead

1 Answers1