Works LibShortText with other languages too?

Question

LibShortText is an open source tool for short-text classification and analysis. http://www.csie.ntu.edu.tw/~cjlin/libshorttext/

I have tried to figure out if it also works with other languages than english (e.g. german)? But I didn't find a hint.

Who knows the answer? Thank you in advance.

greeness · Answer 1 · 2016-09-03T02:03:57.083

I think so (but may need some extra preprocessing). Libsvm and Liblinear are both language-agnostic. Since LibShortText is built on top of LibLinear, it should work for all languages too.

According to this paper, it has internal pre-processing methods to extract features.

libshorttext.converter: For given short texts, LibShortText follows 
the bag-of-word model to generate features. Users apply procedures in
this library to pre-process short texts by tokenization, stemming 
(optional), and stop-word removal (optional). The library also allows 
users to choose between unigram and bigram features.

However, it looks like its stemming and stop-word removal only supports English. So if you want to have better features extracted for non-English text, you might want to use your own pre-processing methods, for example, using nltk.

Works LibShortText with other languages too?

1 Answers1