0

Im working on Topic Modeling on Twitter Data. I extracted the data and stored it in MySQL table. The columns are Date, Place, UserID, Text, tweetID, likes, weekID(based on date I have assigned what week it belongs to). I have also taken data of each week and built a LDA model for each week. Im currently using pyLDAvis from Gensim to visualize the topics in each week. Is there any way I can compare the LDA models which I have for each week. I want to compare them so I can see how a specific topic has been changing over the weeks. Any ideas is much appreciated.

I have tried to build LDA models of each week and I have saved them into html and LDA model files. I want to see how topics have been changing between the weeks.

  • Not a proper answer, but you might want to check out [BERTopic](https://github.com/MaartenGr/BERTopic), it includes [dynamic topic modeling](https://github.com/MaartenGr/BERTopic#dynamic-topic-modeling) which looks like exactly what you describe. – fsimonjetz Mar 28 '22 at 18:39

1 Answers1

0

Building a separate LDA model per week means the algorithm's definition of different topics will change from week-to-week. That raises the thorny question of whether particular distinct per-week topics are in any way overlaps or evolutions of previous topics.

Such a comparison might be possible, but seems more complicated than an alternative approach: train your LDA model on the entire corpus, without segmentation by weeks. Then you have one model for "the whole discourse".

You could then provide the more-simple analysis of comparing how each week's articles reflect a different mix of topics. For example, does a single topic become more or less prevalent over time?

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • I agree. So I was also thinking in the same direction to train the entire data in LDA and have split LDA trained and find the similarity of each week topics to the entire data LDA topics and see the score. – Vijay Venkatesh Mar 31 '22 at 16:50