Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

0 answers

Gensim FastText compute Training Loss

I am training a fastText model using gensim.models.fasttext. However, I can't seem to find a method to compute the loss of the iteration for logging purposes. If I look at gensim.models.word2vec, it has the get_latest_training_loss method which…

asked Jun 01 '18 at 12:13

Hardian Lawi

votes

3 answers

How to load embeddings (in tsv file) generated from StarSpace

Does anyone know how to load a tsv file with embeddings generated from StarSpace into Gensim? Gensim documentation seems to use Word2Vec a lot and I couldn't find a pertinent answer. Thanks, Amulya

gensim word-embedding

asked Mar 03 '18 at 20:05

Just Data

votes

1 answer

Doc2vec and word2vec with negative sampling

My current doc2vec code is as follows. # Train doc2vec model model = doc2vec.Doc2Vec(docs, size = 100, window = 300, min_count = 1, workers = 4, iter = 20) I also have a word2vec code as below. # Train word2vec model model =…

python nlp word2vec gensim doc2vec

asked Oct 21 '17 at 04:58

user8566323

votes

1 answer

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them?

python-3.x gensim lda topic-modeling

asked Apr 06 '17 at 15:52

m.khalil

votes

1 answer

Pyspark - Load trained model word2vec

I want to use word2vec with PySpark to process some data. I was previously using Google trained model GoogleNews-vectors-negative300.bin with gensim in Python. Is there a way I can load this bin file with mllib.word2vec ? Or does it make sense to…

python load pyspark gensim word2vec

asked Apr 06 '17 at 08:27

Pierre

votes

1 answer

How to use pretrained Word2Vec model in Tensorflow

I have a Word2Vec model which is trained in Gensim. How can I use it in Tensorflow for Word Embeddings. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?

python tensorflow gensim word2vec word-embedding

asked Mar 28 '17 at 13:16

neel

8,399
7
36
50

votes

1 answer

gensim word2vec - array dimensions in updating with online word embedding

Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work. model.build_vocab(sentences, update=False) works fine; however, model.build_vocab(sentences, update=True) does not. I am using this website to try and emulate…

python numpy gensim

asked Feb 21 '17 at 02:35

chase

3,592
8
37
58

votes

2 answers

Python tf-idf: fast way to update the tf-idf matrix

I have a dataset of several thousand rows of text, my target is to calculate the tfidf score and then cosine similarity between documents, this is what I did using gensim in Python followed the tutorial: dictionary = corpora.Dictionary(dat) corpus =…

python nlp tf-idf gensim cosine-similarity

asked Feb 13 '17 at 19:54

snowneji

1,086
1
11
25

votes

1 answer

What is the difference between gensim LabeledSentence and TaggedDocument

Please help me in understanding the difference between how TaggedDocument and LabeledSentence of gensim works. My ultimate goal is Text Classification using Doc2Vec model and any classifier. I am following this blog! class…

gensim text-classification word2vec doc2vec

asked Dec 16 '16 at 10:33

Rashmi Singh

votes

3 answers

Gensim installation problems

I am trying to install gensim on a google cloud instance using: pip3 install gensim and this is the stacktrace when I am trying to import gensim: Traceback (most recent call last): File "", line 1, in File…

python pip gensim

asked Nov 22 '16 at 01:20

VJune

1,195
5
16
26

votes

2 answers

How to run tsne on word2vec created from gensim?

I want to visualize a word2vec created from gensim library. I tried sklearn but it seems I need to install a developer version to get it. I tried installing the developer version but that is not working on my machine . Is it possible to modify this…

scikit-learn gensim word2vec

asked Nov 14 '16 at 02:17

Shakti

2,013
8
27
40

votes

1 answer

Select between skip-gram and CBOW model for training word2Vec in gensim

Is it possible to choose between the Skip-gram and the CBOW model in Gensim when training a Word2Vec model?

nlp gensim word2vec

asked Sep 17 '16 at 21:54

machineLearner

votes

1 answer

Gensim get topic for a document (seen document)

I know that after training the lda model for gensim, we can get the topic for an unseen document by: lda = LdaModel(corpus, num_topics=10) doc_lda = lda[doc_bow] But how about the documents that are already used for training? I mean is there a way…

python lda gensim

asked Apr 12 '14 at 15:59

CentAu

10,660
15
59
85

votes

2 answers

Working with google word2vec .bin files in gensim python

I’m trying to get started by loading the pretrained .bin files from the google word2vec site ( freebase-vectors-skipgram1000.bin.gz) into the gensim implementation of word2vec. The model loads fine, using .. model =…

python gensim word2vec

asked Oct 11 '13 at 09:58

user2870492

votes

3 answers

How to print out the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus? from gensim import corpora, models documents = ["Human…

python lda topic-modeling gensim

asked Jul 15 '13 at 20:06

alvas

115,346
109
446
738

Prev 1 2 3

…

99 100 Next