I'm trying to build a Gensim word2vec model by using an external vocabulary. I know Gensim has an internal vocabulary generator however I do not have the same control over them. My problem code is simply.
import gensim
from sklearn.feature_extraction.text import CountVectorizer
corpus = corpusCleaner(raw_corpus)
vocabularyGenerator = CountVectorizer(strip_accents="ascii", stop_words="english")
vocabularyGenerator.fit(corpus)
vocabulary = vocabularyGenerator.vocabulary_
model = gensim.models.Word2Vec()
model.build_vocab_from_freq(vocabulary)
I'm getting C:\Anaconda3\envs\workflow\lib\site-packages\gensim\models\word2vec.py:1235: RuntimeWarning: overflow encountered in int_scalars retain_pct = retain_total * 100 / max(original_total, 1)