Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

1 answer

Get weight matrices from gensim word2Vec

I am using gensim word2vec package in python. I would like to retrieve the W and W' weight matrices that have been learn during the skip-gram learning. It seems to me that model.syn0 gives me the first one but I am not sure how I can get the other…

asked Dec 15 '16 at 11:19

Arcyno

4,153
3
34
52

votes

1 answer

How to monitor convergence of Gensim LDA model?

I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the…

python lda gensim convergence

asked Jun 01 '16 at 13:50

ZeferiniX

votes

2 answers

How to load sentences into Python gensim?

I am trying to use the word2vec module from gensim natural language processing library in Python. The docs say to initialize the model: from gensim.models import word2vec model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4) What…

python nlp gensim

asked Dec 03 '13 at 22:25

john mangual

7,718
13
56
95

votes

2 answers

How to get a complete topic distribution for a document using gensim LDA?

When I train my lda model as such dictionary = corpora.Dictionary(data) corpus = [dictionary.doc2bow(doc) for doc in data] num_cores = multiprocessing.cpu_count() num_topics = 50 lda = LdaMulticore(corpus, num_topics=num_topics, id2word=dictionary,…

python gensim lda

asked Jul 25 '17 at 18:21

PyRsquared

6,970
11
50
86

votes

1 answer

Gensim saved dictionary has no id2token

I have saved a Gensim dictionary to disk. When I load it, the id2token attribute dict is not populated. A simple piece of the code that saves the dictionary: dictionary = corpora.Dictionary(tag_docs) dictionary.save("tag_dictionary_lda.pkl") Now…

python nlp gensim

asked May 09 '17 at 19:26

cjrieds

votes

2 answers

How does gensim calculate doc2vec paragraph vectors

i am going thorugh this paper http://cs.stanford.edu/~quocle/paragraph_vector.pdf and it states that " Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use …

nlp vectorization gensim word2vec doc2vec

asked Nov 04 '16 at 01:18

jxn

7,685
28
90
172

votes

2 answers

How can I tell if Gensim Word2Vec is using the C compiler?

I am trying to use Gensim's Word2Vec implementation. Gensim warns that if you don't have a C compiler, the training will be 70% slower. Is there away to verify that Gensim is correctly using the C Compiler I have installed? I am using Anaconda…

python compilation installation gensim word2vec

asked Sep 30 '16 at 00:09

David

1,224
10
20

votes

4 answers

How to load a pre-trained Word2vec MODEL File and reuse it?

I want to use a pre-trained word2vec model, but I don't know how to load it in python. This file is a MODEL file (703 MB). It can be downloaded here: http://devmount.github.io/GermanWordEmbeddings/

python file model word2vec gensim

asked Sep 17 '16 at 16:40

Vahid SJ

votes

3 answers

How to get vocabulary word count from gensim word2vec?

I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?

gensim word2vec

asked May 12 '16 at 15:12

Michelle Owen

votes

3 answers

Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

I am starting with some python task, I am facing a problem while using gensim. I am trying to load files from my disk and process them (split them and lowercase() them) The code I have is below: dictionary_arr=[] for file_path in…

python gensim

asked Oct 20 '15 at 06:20

Sam

2,545
8
38
59

votes

1 answer

Gensim Word2vec : Semantic Similarity

I wanted to know the difference between gensim word2vec's two similarity measures : most_similar() and most_similar_cosmul(). I know that the first one works using cosine similarity of word vectors while other one uses using the multiplicative…

python semantics similarity gensim word2vec

asked Jul 20 '15 at 19:35

bee2502

1,145
1
10
13

votes

7 answers

cannot import name 'open' from 'smart_open'

I was doing this and got this error : from gensim.models import Word2Vec ImportError: cannot import name 'open' from 'smart_open' (C:\ProgramData\Anaconda3\lib\site-packages\smart_open\__init__.py) Then I did this : import…

deep-learning nlp importerror gensim

asked Jun 03 '20 at 19:29

Abhishek Prajapat

1,793
2
8
19

votes

1 answer

How to properly use get_keras_embedding() in Gensim’s Word2Vec?

I am trying to build a translation network using embedding and RNN. I have trained a Gensim Word2Vec model and it is learning word associations pretty well. However, I couldn’t get my head around how to properly add the layer to a Keras model. (And…

python keras gensim word2vec word-embedding

asked Jul 24 '18 at 07:23

Moobie

1,445
14
21

votes

1 answer

Understanding parameters in Gensim LDA Model

I am using gensim.models.ldamodel.LdaModel to perform LDA, but I do not understand some of the parameters and cannot find explanations in the documentation. If someone has experience working with this, I would love further details of what these…

python parameters gensim lda

asked Jun 11 '18 at 20:30

Jane Sully

3,137
10
48
87

votes

5 answers

Python node2vec (Gensim Word2Vec) "Process finished with exit code 134 (interrupted by signal 6: SIGABRT)"

I am working on node2vec in Python, which uses Gensim's Word2Vec internally. When I am using a small dataset, the code works well. But as soon as I try to run the same code on a large dataset, the code crashes: Error: Process finished with exit…

python pycharm word2vec gensim

asked Jan 16 '18 at 21:51

Zohaib Brohi

Prev 1 2 3

…

99 100 Next