Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

votes

1 answer

How to get negative word samples in Gensim Word2Vec Model?

I am using gensim Word2Vec model to train word embeddings. My code is: w2v_model = Word2Vec(min_count=20, window=2, vector_size=50, sample=6e-5, alpha=0.03, …

asked Apr 15 '22 at 14:02

zeynemrecan

votes

2 answers

word2vec/gensim — RuntimeError: you must first build vocabulary before training the model

I am having trouble training my own word2vec model on the .txt files. The code: import gensim import json import pandas as pd import glob import gensim.downloader as api import matplotlib.pyplot as plt from gensim.models import KeyedVectors #…

python gensim word2vec word-embedding

asked Apr 13 '22 at 21:11

ljushu

votes

1 answer

Understanding output of gensim LDA topic modeling API

I was trying to understand gensim mallet wrapper for topic modeling as explained in this notebook. In point 11, it prepares corpus which if of format Term Document frequency: >>> print(corpus[:1]) # for 1st document >>> [[(0, 1), (1, 1), (2, 1), (3,…

machine-learning artificial-intelligence gensim lda topic-modeling

asked Apr 12 '22 at 17:09

Mahesha999

22,693
29
116
189

votes

1 answer

how to convert a matrix to BoW format?

I am trying to convert a matrix to the type that can be received by gensim. AuthorTopic Model, which means I should convert a matrix to a sparse vector. I have already tried several functions in gensim like gensim.matutils.full2sparse and…

gensim corpus sparse-vector

asked Apr 11 '22 at 05:20

Yu Fu

votes

1 answer

How do I use gensim to vectorize these words in my dataframe so I can perform clustering on them?

I am trying to do a clustering analysis (preferably k-means) of poetry words on a pandas dataframe. I am firstly trying to vectorize the words by using the word-to-vector feature in the gensim package. However, the vectors just come out with 0s, so…

python nlp cluster-analysis gensim

asked Apr 10 '22 at 10:49

Shehzadi Aziz

votes

1 answer

Topic weight in gensim lda model

do we have any function in gensim package to calculate the weights of topics ? I need this weight to compare them with 0.05 (p_value) to understand which one is mean full.

text-mining gensim lda topic-modeling p-value

asked Apr 09 '22 at 20:53

yunes_khosravi

votes

1 answer

Extract meaningful words from spaceless texts

I have not done much NLP, but have a need. For example for the string 'australiafreedomrally', I need to automatically extract meaningful words i.e., 'australia', 'freedom' and 'rally'. Is there any python package that can do it? Thanks

nlp nltk stanford-nlp spacy gensim

asked Apr 08 '22 at 18:24

Cy-Phy

votes

2 answers

Tensorflow 1.14.0 and numpy 1.16.0 compatibility with gensim

I've already downloaded tensorflow 1.14.0 numpy 1.16.0 when using pip install gensim, the gensim vision is 4.1.2 and it needs numpy >=1.17.0 Then my problem is which one the gensim vision i should install to match tensorflow 1.14.0 and numpy…

gensim

asked Apr 05 '22 at 06:54

phoebe

votes

1 answer

Python : 'Doc2Vec' object has no attribute 'dv'

Thanks for advice, I have edited the question I meet this: AttributeError: 'Doc2Vec' object has no attribute 'dv' The error comes from Traceback (most recent call last): File "F:\Projects\项目代码\爬虫程序\headline\main.py", line 13, in …

python nlp gensim doc2vec

asked Apr 04 '22 at 00:40

foling

votes

1 answer

Fitting a Gensim Fasttext pretrained model to my text

I have a pretrained fast text model, I have loaded it into my notebook and want to fit it to my free form text to train a ML classifier. import pandas as pd from sklearn.model_selection import train_test_split from gensim.models import…

nlp gensim fasttext

asked Apr 01 '22 at 19:09

Arica Christensen

votes

0 answers

PYLDAVIS- LdaModel() for topic modelling visualisation using Gensim package

I tried creating a topic modelling using pyldavis gensim library and now the clusters are made. I have one question about the same.Clusters made are cut from the edges. Is there a problem or its fine? A similar sample graph would be:…

python cluster-analysis gensim topic-modeling

asked Apr 01 '22 at 17:42

Shashwat Shankar

votes

1 answer

Compare two LDA models

Im working on Topic Modeling on Twitter Data. I extracted the data and stored it in MySQL table. The columns are Date, Place, UserID, Text, tweetID, likes, weekID(based on date I have assigned what week it belongs to). I have also taken data of each…

python gensim lda

asked Mar 28 '22 at 18:35

Vijay Venkatesh

votes

1 answer

What is gensim's simple_preprocess alternative in Scikit learn

Using Gensim to perform LDA, I was able to do initial text preprocessing and cleanup using: gensim.utils.simple_preprocess(str(sentence),deacc=True) It was very efficient and almost does all required forms of text cleanup in one command. Now, I am…

scikit-learn nlp gensim

asked Mar 27 '22 at 08:09

Mena

3,019
1
25
54

votes

0 answers

Running gensim failing after install in anaconda

I have downloaded anaconda and have also successfully downloaded gensim. I am trying to run gensim but not making much progress. I am running jupyter. When I try the following command in jupyter: from gensim.models.doc2vec import Doc2Vec I get the…

gensim

asked Mar 26 '22 at 07:31

ant

votes

1 answer

'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

I want to use Word2Vec, and i have download a Word2Vec's corpus in indonesian language, but when i call it, it was give me an error, this is what i try : Model = gensim.models.KeyedVectors.load_word2vec_format('/content/drive/MyDrive/Feature…

utf-8 gensim word2vec

asked Mar 23 '22 at 01:50

Ronald Ferdinand

Prev 1 2 3

…

100