I want to use the Lucene API to extract ngrams from sentences. However I seem to be running into a peculiar problem. In the JavaDoc there is a class called NGramTokenizer. I have downloaded both the 3.6.1 and 4.0 API's and I do not see any trace of this class. For example when I try the following I get an error stating that the symbol NGramTokenizer cannot be found:
NGramTokenizer myTokenizer;
In the documentation it appears that the NGramTokenizer is in the path org.apache.lucene.analysis.NGramTokenizer. I do not see this anywhere on my computer. It does not seem likely that a download or other miscellaneous error has occurred since this happens with both the 3.6.1 and 4.0 API's
- How can I obtain the NGramTokenizer class?
- I added the lucene-core-3.6.1.jar to my project