3

I have a tf-idf matrix already, with rows for terms and columns for documents. Now I want to train a LDA model with the given terms-documents matrix. The first step seems to be using gensim.matutils.Dense2Corpus to convert the matrix into the corpus format. But how to construct the id2word parameter? I have the list of the terms (#terms==#rows) but I don't know the format of the dictionary so I cannot construct the dictionary from functions like gensim.corpora.Dictionary.load_from_text. Any suggestions? Thank you.

Ziyuan
  • 4,215
  • 6
  • 48
  • 77

1 Answers1

1

id2word must map each id (integer) to term (string).

In other words, it must support id2word[123] == 'koala'.

A plain Python dict is the easiest option.

Radim
  • 4,208
  • 3
  • 27
  • 38