5

I want to do topic modeling on short texts. I did some research on LDA and found that it doesn't go well with short texts. What methods would be better and do they have Python implementations?

vishnufka
  • 134
  • 2
  • 8
Sri Test
  • 389
  • 1
  • 4
  • 21

4 Answers4

3

You can try Short Text Topic Modelling (refer to this https://www.groundai.com/project/sttm-a-tool-for-short-text-topic-modeling/1) (code available at https://github.com/qiang2100/STTM) . It combine state-of-the-art algorithms and traditional topics modelling for long text which can conveniently be used for short text.

For more specialised libraries, try lda2vec-tf, which combines word vectors with LDA topic vectors. It is branched from the original lda2vec and improved upon and gives better results than the original library.

3

Besides GSDM, there is also biterm implemented in python for short text topic modeling.

chefhose
  • 2,399
  • 1
  • 21
  • 32
1

The only Python implementation of short text topic modeling is GSDMM. Unfortunately, most of the others are written on Java.

Ilya Palachev
  • 274
  • 1
  • 2
  • 16
1

Here's a very fast and easy to use implementation of GSDMM that can be used in Python that I wrote recently: https://github.com/centre-for-humanities-computing/tweetopic I found the existing implementations quite lacking, especially performance-wise, this one usually performs about 60x times faster than gsdmm, is much better documented, and is fully compatible with sklearn.

Márton Kardos
  • 93
  • 2
  • 10
  • 1
    When linking to a github repo that you are a contributor on it can look like self-promotion if you do not disclose that it is your project. Additionally, it is more helpful if you explain what this library is and show how it can help answer the OP's question. Please see this answer for more information: https://meta.stackexchange.com/questions/229085/how-to-offer-personal-open-source-libraries/229091#229091 – Del Sep 08 '22 at 01:08
  • Thanks for the heads up, I will make sure to address this – Márton Kardos Sep 09 '22 at 06:25