Topic model as a dimension reduction method for text mining -- what to do next?

Question

My understanding of the work flow is to run LDA -> Extract keywards (e.g. the top few words for each topics), and hence reduce dimension -> some subsequent analysis.

My question is, if my overall purpose is to give topic to articles in an unsupervised way, or clustering similar documents together, then a running of LDA will take you directly to the goal. Why do you reduce the dimension and then pass it to subsequent analysis? If you do, what sort of subsequent analysis can you do after LDA?

Also, a bit unrelated question -- is it better to ask this question here or at cross validated?

Cross Validated is a bad place for asking this question (personal experience). Data science is better suited. — Sir Cornflakes, Sep 28 '15 at 13:09

score 0 · Answer 1 · answered Sep 28 '15 at 01:54

I think cross validated is a better place for these kinds of questions. Anyhow, there are simple explanations about why we need dimension reduction:

Without dimension reduction, vector operations are not computable. Imagine a dot product between two vector with dimension in size of your dictionary! really?
Each number carry more dense amount of information after reducing the dimension. Which it usually leads to less noise. Intuitively, you only kept useful information.

score 0 · Answer 2 · answered Sep 28 '15 at 13:14

You should rethink your approach, since you are mixing probabilistic methods (LDA) with Linear Algebra (dimensional reduction). When you feel more comfortable with Linear Algebra, consider Non Negative Matrix Factorisation.

Also note that your topics already constitute the reduced dimensions, there is no need to jump back to the extracted top words in the topics.

Topic model as a dimension reduction method for text mining -- what to do next?

2 Answers2