Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions
0
votes
1 answer

How to get negative word samples in Gensim Word2Vec Model?

I am using gensim Word2Vec model to train word embeddings. My code is: w2v_model = Word2Vec(min_count=20, window=2, vector_size=50, sample=6e-5, alpha=0.03, …
0
votes
2 answers

word2vec/gensim — RuntimeError: you must first build vocabulary before training the model

I am having trouble training my own word2vec model on the .txt files. The code: import gensim import json import pandas as pd import glob import gensim.downloader as api import matplotlib.pyplot as plt from gensim.models import KeyedVectors #…
ljushu
  • 105
  • 1
  • 10
0
votes
1 answer

Understanding output of gensim LDA topic modeling API

I was trying to understand gensim mallet wrapper for topic modeling as explained in this notebook. In point 11, it prepares corpus which if of format Term Document frequency: >>> print(corpus[:1]) # for 1st document >>> [[(0, 1), (1, 1), (2, 1), (3,…
Mahesha999
  • 22,693
  • 29
  • 116
  • 189
0
votes
1 answer

how to convert a matrix to BoW format?

I am trying to convert a matrix to the type that can be received by gensim. AuthorTopic Model, which means I should convert a matrix to a sparse vector. I have already tried several functions in gensim like gensim.matutils.full2sparse and…
Yu Fu
  • 1
0
votes
1 answer

How do I use gensim to vectorize these words in my dataframe so I can perform clustering on them?

I am trying to do a clustering analysis (preferably k-means) of poetry words on a pandas dataframe. I am firstly trying to vectorize the words by using the word-to-vector feature in the gensim package. However, the vectors just come out with 0s, so…
0
votes
1 answer

Topic weight in gensim lda model

do we have any function in gensim package to calculate the weights of topics ? I need this weight to compare them with 0.05 (p_value) to understand which one is mean full.
0
votes
1 answer

Extract meaningful words from spaceless texts

I have not done much NLP, but have a need. For example for the string 'australiafreedomrally', I need to automatically extract meaningful words i.e., 'australia', 'freedom' and 'rally'. Is there any python package that can do it? Thanks
Cy-Phy
  • 45
  • 4
0
votes
2 answers

Tensorflow 1.14.0 and numpy 1.16.0 compatibility with gensim

I've already downloaded tensorflow 1.14.0 numpy 1.16.0 when using pip install gensim, the gensim vision is 4.1.2 and it needs numpy >=1.17.0 Then my problem is which one the gensim vision i should install to match tensorflow 1.14.0 and numpy…
phoebe
  • 1
  • 1
0
votes
1 answer

Python : 'Doc2Vec' object has no attribute 'dv'

Thanks for advice, I have edited the question I meet this: AttributeError: 'Doc2Vec' object has no attribute 'dv' The error comes from Traceback (most recent call last): File "F:\Projects\项目代码\爬虫程序\headline\main.py", line 13, in
foling
  • 1
  • 2
0
votes
1 answer

Fitting a Gensim Fasttext pretrained model to my text

I have a pretrained fast text model, I have loaded it into my notebook and want to fit it to my free form text to train a ML classifier. import pandas as pd from sklearn.model_selection import train_test_split from gensim.models import…
0
votes
0 answers

PYLDAVIS- LdaModel() for topic modelling visualisation using Gensim package

I tried creating a topic modelling using pyldavis gensim library and now the clusters are made. I have one question about the same.Clusters made are cut from the edges. Is there a problem or its fine? A similar sample graph would be:…
0
votes
1 answer

Compare two LDA models

Im working on Topic Modeling on Twitter Data. I extracted the data and stored it in MySQL table. The columns are Date, Place, UserID, Text, tweetID, likes, weekID(based on date I have assigned what week it belongs to). I have also taken data of each…
0
votes
1 answer

What is gensim's simple_preprocess alternative in Scikit learn

Using Gensim to perform LDA, I was able to do initial text preprocessing and cleanup using: gensim.utils.simple_preprocess(str(sentence),deacc=True) It was very efficient and almost does all required forms of text cleanup in one command. Now, I am…
Mena
  • 3,019
  • 1
  • 25
  • 54
0
votes
0 answers

Running gensim failing after install in anaconda

I have downloaded anaconda and have also successfully downloaded gensim. I am trying to run gensim but not making much progress. I am running jupyter. When I try the following command in jupyter: from gensim.models.doc2vec import Doc2Vec I get the…
ant
  • 1
0
votes
1 answer

'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

I want to use Word2Vec, and i have download a Word2Vec's corpus in indonesian language, but when i call it, it was give me an error, this is what i try : Model = gensim.models.KeyedVectors.load_word2vec_format('/content/drive/MyDrive/Feature…
1 2 3
99
100