Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

1 answer

Gensim: What is difference between word2vec and doc2vec?

I'm kinda newbie and not native english so have some trouble understanding Gensim's word2vec and doc2vec. I think both give me some words most similar with query word I request, by most_similar()(after training). How can tell which case I have to…

nlp gensim

asked Mar 16 '17 at 06:54

user3595632

5,380
10
55
111

votes

5 answers

Interpreting the sum of TF-IDF scores of words across documents

First let's extract the TF-IDF scores per term per document: from gensim import corpora, models, similarities documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system…

python statistics nlp tf-idf gensim

asked Feb 16 '17 at 09:06

alvas

115,346
109
446
738

votes

1 answer

Export pyLDAvis graphs as standalone webpage

i am analysing text with topic modelling and using Gensim and pyLDAvis for that. Would like to share the results with distant colleagues, without a need for them to install python and all required libraries. Is there a way to export interactive…

python gensim lda topic-modeling

asked Jan 30 '17 at 13:10

Darius

votes

2 answers

Get most similar words, given the vector of the word (not the word itself)

Using the gensim.models.Word2Vec library, you have the possibility to provide a model and a "word" for which you want to find the list of most similar words: model = gensim.models.Word2Vec.load_word2vec_format(model_file,…

python gensim word2vec

asked Jun 14 '16 at 17:22

amin

votes

2 answers

Word2Vec: Effect of window size used

I am trying to train a word2vec model on very short phrases (5 grams). Since each sentence or example is very short, I believe the window size I can use can atmost be 2. I am trying to understand what the implications of such a small window size are…

gensim word2vec

asked Mar 08 '14 at 17:07

vkmv

1,345
1
14
24

votes

3 answers

Gensim 3.8.0 to Gensim 4.0.0

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code: model = KeyedVectors.load_word2vec_format(wv_path, binary= False) words =…

python nlp gensim word2vec word-embedding

asked Mar 30 '21 at 09:28

Md. Ahsanul Kabir Arif

votes

1 answer

Does Gensim library support GPU acceleration?

Using Word2vec and Doc2vec methods provided by Gensim, they have a distributed version which uses BLAS, ATLAS, etc to speedup (details here). However, is it supporting GPU mode? Is it possible to get GPU working if using Gensim?

optimization gpu gensim deeplearning4j

asked Sep 18 '16 at 14:20

Irene Li

votes

2 answers

Visualise word2vec generated from gensim using t-sne

I have trained a doc2vec and corresponding word2vec on my own corpus using gensim. I want to visualise the word2vec using t-sne with the words. As in, each dot in the figure has the "word" also with it. I looked at a similar question here : t-sne on…

scikit-learn data-visualization gensim word2vec

asked May 04 '17 at 07:31

Dreams

5,854
9
48
71

votes

4 answers

How to use gensim BM25 ranking in python

I found gensim has BM25 ranking function. However, i cannot find the tutorial how to use it. In my case, I had one query. a few documents which were retrieved from the search engine. How to use gensim BM 25 ranking to compare the query and…

python ranking gensim

asked Dec 05 '16 at 01:54

dd90p

votes

2 answers

what does the vector of a word in word2vec represents?

word2vec is a open source tool by Google: For each word it provides a vector of float values, what exactly do they represent? There is also a paper on paragraph vector can anyone explain how they are using word2vec in order to obtain fixed length…

machine-learning nlp neural-network gensim

asked Nov 20 '14 at 05:40

user168983

votes

4 answers

word2vec - what is best? add, concatenate or average word vectors?

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the…

python word2vec gensim word-embedding language-model

asked Oct 23 '17 at 12:44

Lemon

1,394
3
14
24

votes

3 answers

Interpreting negative Word2Vec similarity from gensim

E.g. we train a word2vec model using gensim: from gensim import corpora, models, similarities from gensim.models.word2vec import Word2Vec documents = ["Human machine interface for lab abc computer applications", "A survey of user…

python nlp similarity gensim word2vec

asked Feb 22 '17 at 03:00

alvas

115,346
109
446
738

votes

4 answers

Matching words and vectors in gensim Word2Vec model

I have had the gensim Word2Vec implementation compute some word embeddings for me. Everything went quite fantastically as far as I can tell; now I am clustering the word vectors created, hoping to get some semantic groupings. As a next step, I would…

python vector machine-learning gensim word2vec

asked Jul 29 '16 at 18:40

patrick

4,455
6
44
61

votes

2 answers

Gensim word2vec in python3 missing vocab

I'm using gensim implementation of Word2Vec. I have the following code snippet: print('training model') model = Word2Vec(Sentences(start, end)) print('trained model:', model) print('vocab:', model.vocab.keys()) When I run this in python2, it runs…

python gensim word2vec

asked Feb 28 '17 at 19:43

Sam Lee

9,913
15
48
56

votes

6 answers

Using scikit-learn vectorizers and vocabularies with gensim

I am trying to recycle scikit-learn vectorizer objects with gensim topic models. The reasons are simple: first of all, I already have a great deal of vectorized data; second, I prefer the interface and flexibility of scikit-learn vectorizers; third,…

python scikit-learn topic-modeling gensim

asked Feb 04 '14 at 12:25

emiguevara

1,359
13
26

Prev 1 2

…

99 100 Next