I am trying to use carrot2 API to cluster documents in japanese language. It throws out this WARN:
org.carrot2.text.linguistic.DefaultTokenizerFactory: Tokenizer for Japanese (ja) is not available. This may degrade clustering quality of Japanese content.
Hence, the clustering process failed and all docs belong to "other topic" cluster.
Is there any help to solve this problem?
Thanks in advance.