Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

2 answers

Word2vec training using gensim starts swapping after 100K sentences

I'm trying to train a word2vec model using a file with about 170K lines, with one sentence per line. I think I may represent a special use case because the "sentences" have arbitrary strings rather than dictionary words. Each sentence (line) has…

asked Jun 25 '15 at 23:06

Felipe

11,557
7
56
103

votes

2 answers

Retrieve string version of document by ID in Gensim

I am using Gensim for some topic modelling and I have gotten to the point where I am doing similarity queries using the LSI and tf-idf models. I get back the set of IDs and similarities, eg. (299501, 0.64505910873413086). How do I get the text…

python gensim

asked Feb 12 '15 at 22:04

jisaw

votes

3 answers

Not able to import from `gensim.summarization` module in Django

I have included the 2 import statements in my views.py from gensim.summarization.summarizer import summarizer from gensim.summarization import keywords However, even after I installed gensim using pip, I am getting the error: ModuleNotFoundError:…

python django nlp gensim

asked Jun 17 '21 at 11:49

Alpha

votes

1 answer

How to get document_topics distribution of all of the document in gensim LDA?

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code: dictionary = Dictionary(docs) corpus = [dictionary.doc2bow(doc) for doc in docs] from gensim.models import LdaModel num_topics =…

python-3.x gensim lda topic-modeling probability-distribution

asked Nov 15 '18 at 06:23

wayne64001

votes

2 answers

Why Doc2vec gives 2 different vectors for the same texts

I am using Doc2vec to get vectors from words. Please see my below code: from gensim.models.doc2vec import TaggedDocument f = open('test.txt','r') trainings = [TaggedDocument(words = data.strip().split(","),tags = [i]) for i,data in…

python nlp word2vec gensim doc2vec

asked May 16 '18 at 04:32

Thanh Bui

votes

1 answer

Difference between most_similar and similar_by_vector in gensim word2vec?

I was confused with the results of most_similar and similar_by_vector from gensim's Word2vecKeyedVectors. They are supposed to calculate cosine similarities in the same way - however: Running them with one word gives identical results, for…

nlp word2vec gensim

asked May 10 '18 at 14:44

peidaqi

votes

1 answer

Gensim: how to load precomputed word vectors from text file

I have a text file with my precomputed word vectors in the following format (example): word -0.0762464299711 0.0128308048976 ... 0.0712385589283\n’ on each line for every word (with 297 extra floats in place of the ...). I am trying to load these…

python python-3.x nlp gensim

asked Apr 10 '18 at 09:30

iloveseals

votes

1 answer

UnpicklingError: invalid load key, '3'

I am creating a chatbot. So, i need word2vec file in binary format. When i am loading bin file then i am getting this type of error. import gensim model = gensim.models.Word2Vec.load('GoogleNews-vectors-negative300.bin') Traceback (most recent…

python-3.x word2vec gensim

asked Apr 05 '18 at 15:21

surya

votes

1 answer

Improving Gensim Doc2vec results

I tried to apply doc2vec on 600000 rows of sentences: Code as below: from gensim import models model = models.Doc2Vec(alpha=.025, min_alpha=.025, min_count=1, workers = 5) model.build_vocab(res) token_count = sum([len(sentence) for sentence in…

python nlp gensim doc2vec

asked Dec 19 '17 at 15:20

Hackerds

1,195
2
16
34

votes

2 answers

Python: What is the "size" parameter in Gensim Word2vec model class

I have been struggling to understand the use of size parameter in the gensim.models.Word2Vec From the Gensim documentation, size is the dimensionality of the vector. Now, as far as my knowledge goes, word2vec creates a vector of the probability of…

python gensim word2vec

asked Aug 01 '17 at 18:12

Krishnang K Dalal

2,322
9
34
55

votes

3 answers

gensim.interfaces.TransformedCorpus - How use?

I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus…

gensim lda

asked Jul 26 '17 at 03:54

Marco Oliveira

votes

1 answer

Doc2Vec Worse Than Mean or Sum of Word2Vec Vectors

I'm training a Word2Vec model like: model = Word2Vec(documents, size=200, window=5, min_count=0, workers=4, iter=5, sg=1) and Doc2Vec model like: doc2vec_model = Doc2Vec(size=200, window=5, min_count=0, iter=5, workers=4,…

python machine-learning gensim word2vec doc2vec

asked Jul 21 '17 at 09:40

ScientiaEtVeritas

5,158
4
41
59

votes

1 answer

Docker unable to install numpy, scipy, or gensim

I am trying to build a Docker application that uses Python's gensim library, version 2.1.0, which is being installed via pip from a requirements.txt file. However, Docker seems to have trouble installing numpy, scipy, and gensim. I googled the error…

python numpy docker scipy gensim

asked Jun 24 '17 at 02:56

Shuklaswag

1,003
1
10
27

votes

2 answers

Reduce Google's Word2Vec model with Gensim

Loading the complete pre-trained word2vec model by Google is time intensive and tedious, therefore I was wondering if there is a chance to remove words below a certain frequency to bring the vocab count down to e.g. 200k words. I found Word2Vec…

nlp gensim word2vec

asked Feb 25 '17 at 17:38

neurix

4,126
6
46
71

votes

1 answer

doc2vec: How is PV-DBOW implemented

I know that there exists already an implementation of PV-DBOW (paragraph vector) in python (gensim). But I'm interested in knowing how to implement it myself. The explanation from the official paper for PV-DBOW is as follows: Another way is to…

machine-learning nlp neural-network gensim word2vec

asked Mar 15 '16 at 01:42

саша

Prev 1 2 3

…

99 100 Next