Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions
8
votes
0 answers

Gensim FastText compute Training Loss

I am training a fastText model using gensim.models.fasttext. However, I can't seem to find a method to compute the loss of the iteration for logging purposes. If I look at gensim.models.word2vec, it has the get_latest_training_loss method which…
Hardian Lawi
  • 588
  • 5
  • 22
8
votes
3 answers

How to load embeddings (in tsv file) generated from StarSpace

Does anyone know how to load a tsv file with embeddings generated from StarSpace into Gensim? Gensim documentation seems to use Word2Vec a lot and I couldn't find a pertinent answer. Thanks, Amulya
Just Data
  • 81
  • 1
  • 3
8
votes
1 answer

Doc2vec and word2vec with negative sampling

My current doc2vec code is as follows. # Train doc2vec model model = doc2vec.Doc2Vec(docs, size = 100, window = 300, min_count = 1, workers = 4, iter = 20) I also have a word2vec code as below. # Train word2vec model model =…
user8566323
8
votes
1 answer

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them?
m.khalil
  • 81
  • 4
8
votes
1 answer

Pyspark - Load trained model word2vec

I want to use word2vec with PySpark to process some data. I was previously using Google trained model GoogleNews-vectors-negative300.bin with gensim in Python. Is there a way I can load this bin file with mllib.word2vec ? Or does it make sense to…
Pierre
  • 938
  • 1
  • 15
  • 37
8
votes
1 answer

How to use pretrained Word2Vec model in Tensorflow

I have a Word2Vec model which is trained in Gensim. How can I use it in Tensorflow for Word Embeddings. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?
neel
  • 8,399
  • 7
  • 36
  • 50
8
votes
1 answer

gensim word2vec - array dimensions in updating with online word embedding

Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work. model.build_vocab(sentences, update=False) works fine; however, model.build_vocab(sentences, update=True) does not. I am using this website to try and emulate…
chase
  • 3,592
  • 8
  • 37
  • 58
8
votes
2 answers

Python tf-idf: fast way to update the tf-idf matrix

I have a dataset of several thousand rows of text, my target is to calculate the tfidf score and then cosine similarity between documents, this is what I did using gensim in Python followed the tutorial: dictionary = corpora.Dictionary(dat) corpus =…
snowneji
  • 1,086
  • 1
  • 11
  • 25
8
votes
1 answer

What is the difference between gensim LabeledSentence and TaggedDocument

Please help me in understanding the difference between how TaggedDocument and LabeledSentence of gensim works. My ultimate goal is Text Classification using Doc2Vec model and any classifier. I am following this blog! class…
Rashmi Singh
  • 519
  • 1
  • 8
  • 20
8
votes
3 answers

Gensim installation problems

I am trying to install gensim on a google cloud instance using: pip3 install gensim and this is the stacktrace when I am trying to import gensim: Traceback (most recent call last): File "", line 1, in File…
VJune
  • 1,195
  • 5
  • 16
  • 26
8
votes
2 answers

How to run tsne on word2vec created from gensim?

I want to visualize a word2vec created from gensim library. I tried sklearn but it seems I need to install a developer version to get it. I tried installing the developer version but that is not working on my machine . Is it possible to modify this…
Shakti
  • 2,013
  • 8
  • 27
  • 40
8
votes
1 answer

Select between skip-gram and CBOW model for training word2Vec in gensim

Is it possible to choose between the Skip-gram and the CBOW model in Gensim when training a Word2Vec model?
machineLearner
  • 149
  • 1
  • 8
8
votes
1 answer

Gensim get topic for a document (seen document)

I know that after training the lda model for gensim, we can get the topic for an unseen document by: lda = LdaModel(corpus, num_topics=10) doc_lda = lda[doc_bow] But how about the documents that are already used for training? I mean is there a way…
CentAu
  • 10,660
  • 15
  • 59
  • 85
8
votes
2 answers

Working with google word2vec .bin files in gensim python

I’m trying to get started by loading the pretrained .bin files from the google word2vec site ( freebase-vectors-skipgram1000.bin.gz) into the gensim implementation of word2vec. The model loads fine, using .. model =…
user2870492
  • 151
  • 1
  • 7
8
votes
3 answers

How to print out the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus? from gensim import corpora, models documents = ["Human…
alvas
  • 115,346
  • 109
  • 446
  • 738