I want to do topic modeling on short texts. I did some research on LDA and found that it doesn't go well with short texts. What methods would be better and do they have Python implementations?
4 Answers
You can try Short Text Topic Modelling (refer to this https://www.groundai.com/project/sttm-a-tool-for-short-text-topic-modeling/1) (code available at https://github.com/qiang2100/STTM) . It combine state-of-the-art algorithms and traditional topics modelling for long text which can conveniently be used for short text.
For more specialised libraries, try lda2vec-tf, which combines word vectors with LDA topic vectors. It is branched from the original lda2vec and improved upon and gives better results than the original library.

- 56
- 2
-
Are they available in python? – Sample Test Jun 04 '20 at 19:12
-
As I can see, STTM is written on Java and has only Java API. It would be great, though, if somebody makes a Python binding for it. – Ilya Palachev Jun 22 '20 at 05:02
The only Python implementation of short text topic modeling is GSDMM. Unfortunately, most of the others are written on Java.

- 274
- 1
- 2
- 16
Here's a very fast and easy to use implementation of GSDMM that can be used in Python that I wrote recently: https://github.com/centre-for-humanities-computing/tweetopic I found the existing implementations quite lacking, especially performance-wise, this one usually performs about 60x times faster than gsdmm, is much better documented, and is fully compatible with sklearn.

- 93
- 2
- 10
-
1When linking to a github repo that you are a contributor on it can look like self-promotion if you do not disclose that it is your project. Additionally, it is more helpful if you explain what this library is and show how it can help answer the OP's question. Please see this answer for more information: https://meta.stackexchange.com/questions/229085/how-to-offer-personal-open-source-libraries/229091#229091 – Del Sep 08 '22 at 01:08
-