2

I am using gensim to create a bag of words model and I want to perform normalization. I found the documentation (https://radimrehurek.com/gensim/models/normmodel.html), but I am confused as to how to implement that given the code I have. Conversations is a list of tokenized documents, so essentially a list of lists when each element is a document.

id2word = corpora.Dictionary(conversations)
id2word.filter_extremes(keep_n=5000, keep_tokens=None) 
corpus = [id2word.doc2bow(text) for text in conversations]
norm_corpus = NormModel(corpus)

Corpus is a sparse matrix, I believe. For each document, it has the non-zero frequency terms and the corresponding counts: [[(0, 2), (1, 5), (2, 4)...(92, 2), (93, 3)],...].

The last line with norm_corpus does not work when I try to input it into the following: models.LsiModel(norm_corpus, id2word=id2word, num_topics=12). I get the type error message, 'int' object is not iterable. However, the documentation says to pass in a corpus so I'm confused. I would appreciate any help -- thanks!

Jane Sully
  • 3,137
  • 10
  • 48
  • 87

1 Answers1

1

I don't have a way to check at the moment but try this:

norm_corpus = NormModel()
norm_corpus.normalize(text)

or

norm_corpus.normalize(id2word.doc2bow(text)

In your original code you have

`NormModel(iterable)`

but the documentation says you need to pass:

NormModel(iterable of iterable(int,number))

If this makes sense.

muninn
  • 473
  • 1
  • 4
  • 12