Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

4 answers

TypeError: 'Word2Vec' object is not subscriptable

I am trying to build a Word2vec model but when I try to reshape the vector for tokens, I am getting this error. Any idea ? wordvec_arrays = np.zeros((len(tokenized_tweet), 100)) for i in range(len(tokenized_tweet)): wordvec_arrays[i,:] =…

asked May 25 '21 at 12:30

Nishant Kashyap

votes

2 answers

pyLDAvis visualization from gensim not displaying the result in google colab

import pyLDAvis.gensim # Visualize the topics pyLDAvis.enable_notebook() vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word) vis The above code displayed the visualization of LDA model in google colab but then after reopening the notebook it…

visualization gensim lda pyldavis

asked Feb 08 '21 at 05:02

Ravi Prajapati

votes

4 answers

Is it possible to do sentiment analysis of unlabelled text using word2vec model?

I have some text data for which I need to do sentiment classification. I don't have positive or negative labels on this data (unlabelled). I want to use the Gensim word2vec model for sentiment classification. Is it possible to do this? Because till…

python-3.7 gensim word2vec sentiment-analysis

asked Apr 13 '20 at 09:52

Piyush Ghasiya

votes

3 answers

How do you save a model, dictionary and corpus to disk in Gensim, and then load them again?

In Gensim's documentation, it says: You can save trained models to disk and later load them back, either to continue training on new training documents or to transform new documents. I would like to do this with a dictionary, corpus and tf.idf…

python nlp gensim

asked Nov 20 '19 at 19:30

Data

votes

2 answers

Word2vec Gensim Accuracy Analysis

I'm working on a NLP application, where I have a corpus of text files. I would like to create word vectors using the Gensim word2vec algorithm. I did a 90% training and 10% testing split. I trained the model on the appropriate set, but I would like…

python nlp gensim word2vec

asked Oct 10 '18 at 06:43

Sam

votes

1 answer

What does epochs mean in Doc2Vec and train when I have to manually run the iteration?

I am trying to understand the epochs parameter in the Doc2Vec function and epochs parameter in the train function. In the following code snippet, I manually set up a loop of 4000 iterations. Is it required or passing 4000 as epochs parameter in the…

python gensim doc2vec

asked Jul 09 '18 at 12:32

Suhail Gupta

22,386
64
200
328

votes

3 answers

pyLDAvis with Mallet LDA implementation : LdaMallet object has no attribute 'inference'

is it possible to plot a pyLDAvis with a Mallet implementation of LDA ? I have no troubles with LDA_Model but when I use Mallet I get : 'LdaMallet' object has no attribute 'inference' My code : pyLDAvis.enable_notebook() vis =…

gensim topic-modeling mallet

asked May 15 '18 at 00:12

Saguaro

votes

1 answer

Pipeline and GridSearch for Doc2Vec

I currently have following script that helps to find the best model for a doc2vec model. It works like this: First train a few models based on given parameters and then test against a classifier. Finally, it outputs the best model and classifier (I…

scikit-learn pipeline gensim grid-search

asked May 10 '18 at 17:52

Christopher

2,120
7
31
58

votes

2 answers

Applying word2vec to find all words above a similarity threshold

The command model.most_similar(positive=['france'], topn=100) gives the top 100 most similar words to "france". However, I would like to know if there is a method which will output the most similar words above a similarity threshold to a given word.…

word2vec gensim

asked Mar 20 '18 at 18:22

sss90

votes

6 answers

Does gensim.corpora.Dictionary have term frequency saved?

Does gensim.corpora.Dictionary have term frequency saved? From gensim.corpora.Dictionary, it's possible to get the document frequency of the words (i.e. how many document did a particular word occur in): from nltk.corpus import brown from…

python dictionary frequency gensim tf-idf

asked Oct 11 '17 at 09:37

alvas

115,346
109
446
738

votes

3 answers

TypeError: Object of type 'complex' is not JSON serializable while using pyLDAvis.display() function

I have a document Term matrix with nine documents: I am running the code as below: import pyLDAvis.gensim topicData = pyLDAvis.gensim.prepare(ldamodel, docTermMatrix, dictionary) pyLDAvis.display(topicData) I am getting the below error when…

json gensim serializable

asked Sep 23 '17 at 12:49

Gaurav Pandey

votes

1 answer

C extension not loaded for Word2Vec

I re-install the gensim pkg and Cython but it continusly show this warning, Does anybody know about this? I am using Python 3.6,PyCharm Linux Mint. UserWarning: C extension not loaded for Word2Vec, training will be slow. Install a C compiler and…

python python-3.x gensim word2vec

asked Aug 04 '17 at 06:22

user8349292

votes

1 answer

What is different between doc2vec models when the dbow_words is set to 1 or 0?

I read this page but I do not understand what is different between models which are built based on the following codes. I know when dbow_words is 0, training of doc-vectors is faster. First model model = doc2vec.Doc2Vec(documents1, size = 100,…

gensim doc2vec

asked May 16 '17 at 21:15

user3092781

votes

2 answers

Python Gensim how to make WMD similarity run faster with multiprocessing

I am trying to run gensim WMD similarity faster. Typically, this is what is in the docs: Example corpus: my_corpus = ["Human machine interface for lab abc computer applications", >>> "A survey of user opinion of computer system…

python multithreading multiprocessing gensim

asked May 16 '17 at 12:06

jxn

7,685
28
90
172

votes

1 answer

How can I access output embedding(output vector) in gensim word2vec?

I want to use output embedding of word2vec such as in this paper (Improving document ranking with dual word embeddings). I know input vectors are in syn0, output vectors are in syn1 and syn1neg if negative sampling. But when I calculated…

python numpy gensim word2vec

asked Mar 02 '17 at 11:31

Suin SEO

Prev 1 2 3

…

99 100 Next