Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

4 answers

Getting "init() got an unexpected keyword argument 'document'" this error in python I'm working with Word2Vec and gensim

I'm working on project using Word2vec and gensim, model = gensim.models.Word2Vec( documents = 'userDataFile.txt', size=150, window=10, min_count=2, workers=10) model =…

python gensim word2vec

asked Nov 07 '18 at 18:49

dubooduboo

votes

3 answers

In spacy, how to use your own word2vec model created in gensim?

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly…

model word2vec gensim spacy

asked May 22 '18 at 11:32

Subigya Upadhyay

votes

1 answer

Why are multiple model files created in gensim word2vec?

When I try to create a word2vec model (skipgram with negative sampling) I received 3 files as output as follows. word2vec (File) word2vec.syn1nef.npy (NPY file) word2vec.wv.syn0.npy (NPY file) I am just worried why this happens as for my previous…

python word2vec gensim word-embedding

asked Nov 08 '17 at 07:07

user8871463

votes

4 answers

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does the same LDA parameters and corpus generate…

python nlp lda topic-modeling gensim

asked Feb 25 '13 at 13:08

alvas

115,346
109
446
738

votes

5 answers

How to remove a word completely from a Word2Vec model in gensim?

Given a model, e.g. from gensim.models.word2vec import Word2Vec documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management…

python dictionary word2vec gensim del

asked Feb 23 '18 at 05:26

alvas

115,346
109
446
738

votes

4 answers

gensim word2vec accessing in/out vectors

In the word2vec model, there are two linear transforms that take a word in vocab space to a hidden layer (the "in" vector), and then back to the vocab space (the "out" vector). Usually this out vector is discarded after training. I'm wondering if…

python gensim

asked Nov 07 '16 at 06:03

Alex R.

1,397
3
18
33

votes

2 answers

Is there pre-trained doc2vec model?

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

gensim doc2vec

asked Jul 02 '18 at 09:25

Idriss Brahimi

votes

3 answers

Get bigrams and trigrams in word2vec Gensim

I am currently using uni-grams in my word2vec model as follows. def review_to_sentences( review, tokenizer, remove_stopwords=False ): #Returns a list of sentences, where each sentence is a list of words # #NLTK tokenizer to split the…

python tokenize word2vec gensim n-gram

asked Sep 09 '17 at 09:49

user8566323

votes

3 answers

How to use TaggedDocument in gensim?

I have two directories from which I want to read their text files and label them, but I don't know how to do this via TaggedDocument. I thought it would work as TaggedDocument([Strings],[Labels]) but this doesn't work apparently. This is my code:…

python nltk gensim word2vec doc2vec

asked Jul 16 '17 at 06:35

Farhood

votes

2 answers

get_document_topics and get_term_topics in gensim

The ldamodel in gensim has the two methods: get_document_topics and get_term_topics. Despite their use in this gensim tutorial notebook, I do not fully understand how to interpret the output of get_term_topics and created the self-contained code…

python gensim topic-modeling

asked Apr 11 '17 at 22:29

tkja

1,950
5
22
40

votes

2 answers

Gensim train word2vec on wikipedia - preprocessing and parameters

I am trying to train the word2vec model from gensim using the Italian wikipedia "http://dumps.wikimedia.org/itwiki/latest/itwiki-latest-pages-articles.xml.bz2" However, I am not sure what is the best preprocessing for this corpus. gensim model…

nlp gensim word2vec

asked May 19 '14 at 10:37

Luca Fiaschi

3,145
7
31
44

votes

2 answers

Document topical distribution in Gensim LDA

I've derived a LDA topic model using a toy corpus as follows: documents = ['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system response time', 'The EPS user interface…

python lda gensim

asked Jun 26 '13 at 03:13

Moses Xu

2,140
4
24
35

votes

2 answers

Using word2vec to classify words in categories

BACKGROUND I have vectors with some sample data and each vector has a category name (Places,Colors,Names). ['john','jay','dan','nathan','bob'] -> 'Names' ['yellow', 'red','green'] -> 'Colors' ['tokyo','bejing','washington','mumbai'] -> 'Places' My…

python machine-learning nlp word2vec gensim

asked Dec 06 '17 at 04:16

Dinero

1,070
2
19
44

votes

2 answers

Using a Word2Vec model pre-trained on wikipedia

I need to use gensim to get vector representations of words, and I figure the best thing to use would be a word2vec module that's pre-trained on the english wikipedia corpus. Does anyone know where to download it, how to install it, and how to use…

wikipedia gensim word2vec

asked Jul 25 '17 at 17:51

Boris

votes

2 answers

Chunkize warning while installing gensim

I have installed gensim (through pip) in Python. After the installation is over I get the following warning: C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial …

python gensim

asked Jan 15 '17 at 06:43

user7420652

Prev 1 2 3

…

99 100 Next