Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions
16
votes
1 answer

Get weight matrices from gensim word2Vec

I am using gensim word2vec package in python. I would like to retrieve the W and W' weight matrices that have been learn during the skip-gram learning. It seems to me that model.syn0 gives me the first one but I am not sure how I can get the other…
Arcyno
  • 4,153
  • 3
  • 34
  • 52
16
votes
1 answer

How to monitor convergence of Gensim LDA model?

I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the…
ZeferiniX
  • 500
  • 5
  • 18
16
votes
2 answers

How to load sentences into Python gensim?

I am trying to use the word2vec module from gensim natural language processing library in Python. The docs say to initialize the model: from gensim.models import word2vec model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4) What…
john mangual
  • 7,718
  • 13
  • 56
  • 95
15
votes
2 answers

How to get a complete topic distribution for a document using gensim LDA?

When I train my lda model as such dictionary = corpora.Dictionary(data) corpus = [dictionary.doc2bow(doc) for doc in data] num_cores = multiprocessing.cpu_count() num_topics = 50 lda = LdaMulticore(corpus, num_topics=num_topics, id2word=dictionary,…
PyRsquared
  • 6,970
  • 11
  • 50
  • 86
15
votes
1 answer

Gensim saved dictionary has no id2token

I have saved a Gensim dictionary to disk. When I load it, the id2token attribute dict is not populated. A simple piece of the code that saves the dictionary: dictionary = corpora.Dictionary(tag_docs) dictionary.save("tag_dictionary_lda.pkl") Now…
cjrieds
  • 827
  • 8
  • 13
15
votes
2 answers

How does gensim calculate doc2vec paragraph vectors

i am going thorugh this paper http://cs.stanford.edu/~quocle/paragraph_vector.pdf and it states that " Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use …
jxn
  • 7,685
  • 28
  • 90
  • 172
15
votes
2 answers

How can I tell if Gensim Word2Vec is using the C compiler?

I am trying to use Gensim's Word2Vec implementation. Gensim warns that if you don't have a C compiler, the training will be 70% slower. Is there away to verify that Gensim is correctly using the C Compiler I have installed? I am using Anaconda…
David
  • 1,224
  • 10
  • 20
15
votes
4 answers

How to load a pre-trained Word2vec MODEL File and reuse it?

I want to use a pre-trained word2vec model, but I don't know how to load it in python. This file is a MODEL file (703 MB). It can be downloaded here: http://devmount.github.io/GermanWordEmbeddings/
Vahid SJ
  • 383
  • 1
  • 2
  • 12
15
votes
3 answers

How to get vocabulary word count from gensim word2vec?

I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?
Michelle Owen
  • 361
  • 1
  • 3
  • 11
15
votes
3 answers

Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

I am starting with some python task, I am facing a problem while using gensim. I am trying to load files from my disk and process them (split them and lowercase() them) The code I have is below: dictionary_arr=[] for file_path in…
Sam
  • 2,545
  • 8
  • 38
  • 59
15
votes
1 answer

Gensim Word2vec : Semantic Similarity

I wanted to know the difference between gensim word2vec's two similarity measures : most_similar() and most_similar_cosmul(). I know that the first one works using cosine similarity of word vectors while other one uses using the multiplicative…
bee2502
  • 1,145
  • 1
  • 10
  • 13
14
votes
7 answers

cannot import name 'open' from 'smart_open'

I was doing this and got this error : from gensim.models import Word2Vec ImportError: cannot import name 'open' from 'smart_open' (C:\ProgramData\Anaconda3\lib\site-packages\smart_open\__init__.py) Then I did this : import…
Abhishek Prajapat
  • 1,793
  • 2
  • 8
  • 19
14
votes
1 answer

How to properly use get_keras_embedding() in Gensim’s Word2Vec?

I am trying to build a translation network using embedding and RNN. I have trained a Gensim Word2Vec model and it is learning word associations pretty well. However, I couldn’t get my head around how to properly add the layer to a Keras model. (And…
Moobie
  • 1,445
  • 14
  • 21
14
votes
1 answer

Understanding parameters in Gensim LDA Model

I am using gensim.models.ldamodel.LdaModel to perform LDA, but I do not understand some of the parameters and cannot find explanations in the documentation. If someone has experience working with this, I would love further details of what these…
Jane Sully
  • 3,137
  • 10
  • 48
  • 87
14
votes
5 answers

Python node2vec (Gensim Word2Vec) "Process finished with exit code 134 (interrupted by signal 6: SIGABRT)"

I am working on node2vec in Python, which uses Gensim's Word2Vec internally. When I am using a small dataset, the code works well. But as soon as I try to run the same code on a large dataset, the code crashes: Error: Process finished with exit…
Zohaib Brohi
  • 576
  • 1
  • 7
  • 15