Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

1 answer

Doc2Vec and PySpark: Gensim Doc2vec over DeepDist

I am looking at the DeepDist (link) module and thinking to combine it with Gensim's Doc2Vec API to train paragraph vectors on PySpark. The link actually provides with the following clean example for how to do it for Gensim's Word2Vec model: from…

asked Feb 25 '16 at 00:40

Patrick the Cat

2,138
1
16
33

votes

3 answers

Is it possible to re-train a word2vec model (e.g. GoogleNews-vectors-negative300.bin) from a corpus of sentences in python?

I am using pre-trained Google news dataset for getting word vectors by using Gensim library in python model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) After loading the model I am converting training reviews…

python nlp gensim word2vec

asked Jan 31 '16 at 18:17

Nomiluks

2,052
5
31
53

votes

2 answers

What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?

I am trying to obtain the optimal number of topics for an LDA-model within Gensim. One method I found is to calculate the log likelihood for each model and compare each against each other, e.g. at The input parameters for using latent Dirichlet…

python text-mining lda gensim topic-modeling

asked Aug 31 '15 at 13:58

Akantor

votes

2 answers

How to obtain antonyms through word2vec?

I am currently working on word2vec model using gensim in Python, and want to write a function that can help me find the antonyms and synonyms of a given word. For example: antonym("sad")="happy" synonym("upset")="enraged" Is there a way to do that…

python gensim word2vec

asked Aug 04 '15 at 16:42

Salamander

votes

3 answers

How to predict the topic of a new query using a trained LDA model using gensim?

I have trained a corpus for LDA topic modelling using gensim. Going through the tutorial on the gensim website (this is not the whole code): question = 'Changelog generation from Github issues?'; temp = question.lower() for i in…

python nlp lda topic-modeling gensim

asked Apr 28 '13 at 10:39

Animesh Pandey

5,900
13
64
130

votes

1 answer

LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn

I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn. Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic model. Gensim's CoherenceModel allows Topic…

scikit-learn nlp gensim lda topic-modeling

asked Aug 30 '18 at 18:01

learning-new-things-guy

votes

3 answers

Continue training a FastText model

I have downloaded a .bin FastText model, and I use it with gensim as follows: model = FastText.load_fasttext_format("cc.fr.300.bin") I would like to continue the training of the model to adapt it to my domain. After checking FastText's Github and…

python gensim fasttext

asked Aug 29 '18 at 14:47

ted

13,596
9
65
107

votes

2 answers

gensim: pickle or not?

I have a question related to gensim. I like to know whether it is recommended or necessary to use pickle while saving or loading a model (or multiple models), as I find scripts on GitHub that do either. mymodel = Doc2Vec(documents, size=100,…

memory model pickle gensim

asked Jun 02 '18 at 09:26

Christopher

2,120
7
31
58

votes

2 answers

How to do Text classification using word2vec

I want to perform text classification using word2vec. I got vectors of words. ls = [] sentences = lines.split(".") for i in sentences: ls.append(i.split()) model = Word2Vec(ls, min_count=1, size = 4) words =…

python-3.x word2vec gensim text-classification

asked Apr 04 '18 at 06:10

Shubham Agrawal

votes

5 answers

How to access topic words only in gensim

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only I tried print_topics() and show_topics() functions in gensim but I can't get clean words ! This…

python nlp gensim lda topic-modeling

asked Oct 03 '17 at 01:58

Muhammed Eltabakh

votes

1 answer

How to use the infer_vector in gensim.doc2vec?

def cosine(vector1,vector2): cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2)) return cosV12 model=gensim.models.doc2vec.Doc2Vec.load('Model_D2V_Game') string='民生为了父亲我要坚强地 ...' list=string.split('…

python gensim doc2vec

asked Jul 09 '17 at 05:19

Jeffery

votes

4 answers

Python NLP British English vs American English

I'm currently working on NLP in python. However, in my corpus, there are both British and American English(realize/realise) I'm thinking to convert British to American. However, I did not find a good tool/package to do that. Any suggestions?

python nlp nltk gensim linguistics

asked Feb 19 '17 at 16:28

Mr.cysl

1,494
6
23
37

votes

1 answer

Getting TF-IDF Scores Of Words Using Gensim

I am trying to find the most important words in a corpus based on their TF-IDF scores. Been following along the example at https://radimrehurek.com/gensim/tut2.html. Based on >>> for doc in corpus_tfidf: ... print(doc) the TF-IDF score is…

python tf-idf gensim

asked Apr 15 '16 at 17:56

user799188

13,965
5
35
37

votes

1 answer

Why Gensim doc2vec give AttributeError: 'list' object has no attribute 'words'?

I am trying to experiment gensim doc2vec, by using following code. As far as I understand from tutorials, it should work. However it gives AttributeError: 'list' object has no attribute 'words'. from gensim.models.doc2vec import LabeledSentence,…

python-3.x gensim word2vec

asked Apr 08 '16 at 21:55

W.S.

votes

3 answers

Error while loading Word2Vec model in gensim

I'm getting an AttributeError while loading the gensim model available at word2vec repository: from gensim import models w = models.Word2Vec() w.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) print…

python gensim word2vec

asked Aug 19 '15 at 17:17

Tarantula

19,031
12
54
71

Prev 1 2 3

…

99 100 Next