Similarity between LDA results over two different number of topics?

Question

if we choose 20 topics in LDA and then if we choose 30 topics. So my question is will both these results intersect those 20 topics and produce similar results

Everst · Accepted Answer · 2014-07-02T01:25:36.843

Short answer - no. The way LDA works is it uses Gibbs sampler to get Dirichlet distribution over document vectors. Allocations are then made on this sample and hence will always be different both because of sampling randomness and allocation uncertainties unless you define explicit random seed and run same number of topics k. Take a look at original paper Blei et al. 2003 to see how k is defined.

UPDATE (with regard to comment): Hierarchical LDA (hLDA) is trying to solve the problem of retaining topics and subtopics by constructing levels of topics following the Chinese restaurant model. But it's still not perfect.

The way flat LDA works, however, is it looks at documents rather than topics to produce further results. Say, you get topic 0 (first table in restaurant) and all documents try to sit there, but it's not really enough space and you create another topic 1 where some docs feel more comfortable, etc., etc. now you are right from the point of view of how these tables are created. But there is one big thing that's critical - topic 0 CHANGES when you create a new table/Topic 1 because some documents have left the first table and took the words (or probabilities of cooccurence thereof) with them to the new table and all words in topic 0 got reshuffled given new situation. Same happens when you create more tables/topics that all the previous are also re-estimated. Hence, you will never get same 20 topics when rerunning with 30.

well thanks for answering :). But the question in my mind is shouldn't they be same from the perspective of topic modelling. How can we view this.....that topic furthe divided into sub-topics?? — hitesh_noty, Jul 01 '14 at 00:35

Similarity between LDA results over two different number of topics?

1 Answers1