0

LibShortText is an open source tool for short-text classification and analysis. http://www.csie.ntu.edu.tw/~cjlin/libshorttext/

I have tried to figure out if it also works with other languages than english (e.g. german)? But I didn't find a hint.

Who knows the answer? Thank you in advance.

NewbieXXL
  • 155
  • 1
  • 1
  • 11

1 Answers1

0

I think so (but may need some extra preprocessing). Libsvm and Liblinear are both language-agnostic. Since LibShortText is built on top of LibLinear, it should work for all languages too.

According to this paper, it has internal pre-processing methods to extract features.

libshorttext.converter: For given short texts, LibShortText follows 
the bag-of-word model to generate features. Users apply procedures in
this library to pre-process short texts by tokenization, stemming 
(optional), and stop-word removal (optional). The library also allows 
users to choose between unigram and bigram features.

However, it looks like its stemming and stop-word removal only supports English. So if you want to have better features extracted for non-English text, you might want to use your own pre-processing methods, for example, using nltk.

greeness
  • 15,956
  • 5
  • 50
  • 80