Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

7 answers

Getting an error to install pyemd even though I just installed it

Here is the code: from pyemd import emd print("sentence 1:") print(input_document_lower[0]) print("sentence 2:") print(input_document_lower[1]) print("similarity:") model_w2v.wmdistance(input_document_lower[0], input_document_lower[1]) Here's the…

asked Nov 03 '17 at 23:47

madsthaks

2,091
6
25
46

votes

1 answer

Is it possible to use gensim word2vec model in deeplearning4j.word2vec?

I'm new to deeplearning4j, i want to make sentence classifier using words vector as input for the classifier. I was using python before, where the vector model was generated using gensim, and i want to use that model for this new classifier. Is it…

java gensim word2vec deeplearning4j

asked Apr 26 '17 at 11:37

zunzelf

votes

1 answer

Does Doc2Vec learn representations for the tags?

I'm using the Doc2Vec tags as an unique identifier for my documents, each document has a different tag and no semantic meaning. I'm using the tags to find specific documents so I can calculate the similarity between them. Do the tags influence the…

gensim doc2vec

asked Apr 21 '17 at 13:16

Stanko

4,275
3
23
51

votes

1 answer

gensim KeydVectors dimensions

Im gensims latest version, loading trained vectors from a file is done using KeyedVectors, and dosent requires instantiating a new Word2Vec object. But now my code is broken because I can't use the model.vector_size property. What is the alternative…

python-3.x gensim

asked Apr 04 '17 at 08:56

proton

votes

1 answer

Incremental Word2Vec Model Training in gensim

I have tried to train incrementally word2vec model produced by gensim. But I found that the vocabulary size doesn't increased , only the word2vec model weights are updated . But i need to update both vocabulary and model size . #Load data…

python deep-learning gensim word2vec

asked Mar 12 '17 at 09:58

Rabindra Nath Nandi

1,433
1
15
28

votes

2 answers

AttributeError: type object 'Word2Vec' has no attribute 'load_word2vec_format'

I am trying to implement word2vec model and getting Attribute error AttributeError: type object 'Word2Vec' has no attribute 'load_word2vec_format' Below is the code : wv = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin.gz",…

python nlp gensim word2vec

asked Feb 21 '17 at 09:49

Rishabh Rusia

votes

1 answer

Doc2Vec: Differentiate Sentence and Document

I am just playing around with Doc2Vec from gensim, analysing stackexchange dump to analyze semantic similarity of questions to identify duplicates. The tutorial on Doc2Vec-Tutorial seems to describe the input as tagged sentences. But the original…

python gensim doc2vec

asked Feb 15 '17 at 06:55

Vikash Balasubramanian

2,921
3
33
74

votes

4 answers

AttributeError: 'list' object has no attribute 'lower' gensim

I have a list of 10k words in a text file like so: G15 KDN C30A Action Standard Air Brush Air Dilution I am trying to convert them into lower cased tokens using this code for subsequent processing with GenSim: data = [line.strip() for line in…

python string split gensim

asked Jan 24 '17 at 13:21

tom

votes

1 answer

gensim: custom similarity measure

Using gensim, I want to calculate the similarity within a list of documents. This library is excellent at handling the amounts of data that I have got. The documents are all reduced to timestamps and I have got a function time_similarity to compare…

python time similarity gensim

asked Jun 27 '16 at 12:45

Simon

5,464
6
49
85

votes

1 answer

Semantic Similarity between Phrases Using GenSim

Background I am trying to judge whether a phrase is semantically related to other words found in a corpus using Gensim. For example, here is the corpus document pre-tokenized: **Corpus** Car Insurance Car Insurance Coverage Auto Insurance Best…

python-3.x nltk gensim

asked Aug 05 '15 at 01:06

user3682157

1,625
8
29
55

votes

1 answer

Doc2vec MemoryError

I am using the doc2vec model from teh gensim framework to represent a corpus of 15 500 000 short documents (up to 300 words): gensim.models.Doc2Vec(sentences, size=400, window=10, min_count=1, workers=8 ) After creating the vectors there are …

python memory gensim word2vec

asked May 27 '15 at 16:53

Silvia Necsulescu

votes

1 answer

Understanding LDA Transformed Corpus in Gensim

I tried to examine the contents of the BOW corpus vs. the LDA[BOW Corpus] (transformed by LDA model trained on that corpus with, say, 35 topics) I found the following output: DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)] LDA 1 : [(29,…

python nlp lda gensim

asked May 07 '14 at 05:48

Ravi Karan

votes

3 answers

Are there any efficient python libraries for Dynamic Topic Models, preferably extending Gensim?

I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. For a changing content stream like twitter, Dynamic…

python lda text-analysis topic-modeling gensim

asked Mar 18 '14 at 02:52

Ravi Karan

votes

1 answer

Gensim Dictionary Implementation

I was just curious about the gensim dictionary implementation. I have the following code: def build_dictionary(documents): dictionary = corpora.Dictionary(documents) dictionary.save('/tmp/deerwester.dict') # store the dictionary …

python nlp topic-modeling gensim

asked Aug 12 '13 at 09:38

dmil

votes

1 answer

How to visualize Gensim Word2vec Embeddings in Tensorboard Projector

Following gensim word2vec embedding tutorial, I have trained a simple word2vec model: from gensim.test.utils import common_texts from gensim.models import Word2Vec model = Word2Vec(sentences=common_texts, size=100, window=5, min_count=1,…

python tensorflow gensim word2vec tensorboard

asked Sep 18 '21 at 13:11

G. Macia

1,204
3
23
38

Prev 1 2 3

…

99 100 Next