I am working on a project where I need to apply topic modelling to a set of documents and I need to create a matrix :
DT , a D × T matrix, where D is the number of documents and T is the number of topics. DT(ij) contains the number of times a word in document Di has been assigned to topic Tj.
So far I have followed this tut: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html
I am new to gensim and so far I have 1. created a document list 2. preprocessed and tokenized the documents. 3. Used corpora.Dictionary() to create id-> term dictionary (id2word) 4. convert tokenized documents into a document-term matrix
generated an LDA model. So now I get the topics.
How can I now get the matrix that I mentioned before. I will be using this matrix to calculate similarity between 2 documents on topic t as :
sim(a,b) = 1- |DT(a,t) - DT(b, t)|