Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

1 answer

gensim LdaMulticore not multiprocessing?

When I run gensim's LdaMulticore model on a machine with 12 cores, using: lda = LdaMulticore(corpus, num_topics=64, workers=10) I get a logging message that says using serial LDA version on this node A few lines later, I see another loging…

asked Nov 26 '15 at 02:31

Edward Newell

17,203
7
34
36

votes

4 answers

How to filter out words with low tf-idf in a corpus with gensim?

I am using gensim for some NLP task. I've created a corpus from dictionary.doc2bow where dictionary is an object of corpora.Dictionary. Now I want to filter out the terms with low tf-idf values before running an LDA model. I looked into the…

python nlp gensim

asked Jul 10 '14 at 23:53

Ziyuan

4,215
6
48
77

votes

4 answers

Gensim: How to save LDA model's produced topics to a readable format (csv,txt,etc)?

last parts of the code: lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2) print lda bash output: INFO : adding document #0 to Dictionary(0 unique tokens) INFO : built Dictionary(18 unique tokens) from 5 documents (total 20 corpus…

python lda gensim

asked Jun 27 '13 at 22:39

jeremy.ting

votes

1 answer

Can we use a self made corpus for training for LDA using gensim?

I have to apply LDA (Latent Dirichlet Allocation) to get the possible topics from a data base of 20,000 documents that I collected. How can I use these documents rather than the other corpus available like the Brown Corpus or English Wikipedia as…

python lda gensim

asked Apr 27 '13 at 16:05

Animesh Pandey

5,900
13
64
130

votes

2 answers

How to import gensim summarize

I got gensim to work in Google Collab by following this process: !pip install gensim from gensim.summarization import summarize Then I was able to call summarize(some_text) Now I'm trying to run the same thing in VS code: I've installed…

python visual-studio-code nlp gensim

asked Sep 05 '21 at 15:54

Katie Melosto

1,047
2
14
35

votes

0 answers

ModuleNotFoundError: No module named 'numpy.testing.decorators'

I really need some help, as I have gone through all the posts and nothing has worked. I get this error when importing gensim and not numpy (numpy is before and works fine). All I want to do is import gensim and numpy to then run my analysis. Here is…

python numpy installation python-3.6 gensim

asked Apr 23 '21 at 12:15

astampib

votes

2 answers

Gensim LDA Coherence Score Nan

I created a Gensim LDA Model as shown in this tutorial: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100,…

python machine-learning gensim lda topic-modeling

asked Feb 16 '20 at 08:03

Ramsha Siddiqui

votes

1 answer

Not efficiently to use multi-Core CPU for training Doc2vec with gensim

I am using 24 cores virtual CPU and 100G memory to training Doc2Vec with Gensim, but the usage of CPU always is around 200% whatever to modify the number of cores. top htop The above two pictures showed the percentage of cpu usage, this pointed…

gensim

asked Aug 16 '19 at 23:07

Ivan Lee

3,420
4
30
45

votes

2 answers

Cosine similarity between 0 and 1

I am interested in calculating similarity between vectors, however this similarity has to be a number between 0 and 1. There are many questions concerning tf-idf and cosine similarity, all indicating that the value lies between 0 and 1. From…

python scikit-learn gensim similarity cosine-similarity

asked May 26 '19 at 19:53

Bram Vanroy

27,032
24
137
239

votes

2 answers

How to avoid decoding to str: need a bytes-like object error in pandas?

Here is my code : data = pd.read_csv('asscsv2.csv', encoding = "ISO-8859-1", error_bad_lines=False); data_text = data[['content']] data_text['index'] = data_text.index documents = data_text It looks like print(documents[:2]) …

python python-3.x pandas gensim topic-modeling

asked Dec 16 '18 at 09:10

wayne64001

votes

1 answer

Gensim (word2vec) retrieve n most frequent words

How is it possible to retrieve the n most frequent words from a Gensim word2vec model? As I understand, the frequency and count are not the same, and I therefore can't use the object.count() method. I need to produce a list of the n most frequent…

gensim

asked Dec 04 '18 at 21:31

Phils19

votes

1 answer

Python/Gensim - What is the meaning of syn0 and syn0norm?

I know that in gensims KeyedVectors-model, one can access the embedding matrix by the attribute model.syn0. There is also a syn0norm, which doesn't seem to work for the glove model I recently loaded. I think I also have seen syn1 somewhere…

python deep-learning nlp gensim word-embedding

asked Nov 14 '18 at 13:56

MBT

21,733
19
84
102

votes

2 answers

Loss does not decrease during training (Word2Vec, Gensim)

What can cause loss from model.get_latest_training_loss() increase on each epoch? Code, used for training: class EpochSaver(CallbackAny2Vec): '''Callback to save model after each epoch and show training parameters ''' def __init__(self,…

python gensim word2vec loss

asked Aug 27 '18 at 11:48

Dasha

votes

2 answers

How to build a gensim dictionary that includes bigrams?

I'm trying to build a Tf-Idf model that can score bigrams as well as unigrams using gensim. To do this, I build a gensim dictionary and then use that dictionary to create bag-of-word representations of the corpus that I use to build the model. The…

python nlp gensim

asked Jul 19 '18 at 15:07

fraxture

5,113
4
43
83

votes

3 answers

Gensim Word2Vec select minor set of word vectors from pretrained model

I have a large pretrained Word2Vec model in gensim from which I want to use the pretrained word vectors for an embedding layer in my Keras model. The problem is that the embedding size is enormous and I don't need most of the word vectors (because…

python keras word2vec gensim word-embedding

asked Jun 18 '18 at 17:32

getaway22

Prev 1 2 3

…

99 100 Next