Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Should I split sentences in a document for Doc2Vec?

I am building a Doc2Vec model with 1000 documents using Gensim. Each document has consisted of several sentences which include multiple words. Example) Doc1: [[word1, word2, word3], [word4, word5, word6, word7],[word8, word9, word10]] Doc2: [[word7,…

gensim word2vec doc2vec

asked Mar 17 '21 at 02:04

porororo

votes

1 answer

Checking model overfit of doc2vec with infer_vector()

my aim is to create document embeddings from the column df["text"] as a first step and then as a second step plug them along with other variables into a XGBoost Regressor model in order to make predictions. This works very well for the train_df. I…

python testing nlp gensim doc2vec

asked Oct 26 '20 at 12:28

karabara

votes

1 answer

Why does a Gensim Doc2vec object return empty doctags?

My question is how I should interpret my situation? I trained a Doc2Vec model following this tutorial https://blog.griddynamics.com/customer2vec-representation-learning-and-automl-for-customer-analytics-and-personalization/. For some reason,…

gensim doc2vec

asked May 25 '20 at 16:54

Jeong Kim

votes

1 answer

Cannot load Doc2vec object using gensim

I am trying to load a pre-trained Doc2vec model using gensim and use it to map a paragraph to a vector. I am referring to https://github.com/jhlau/doc2vec and the pre-trained model I downloaded is the English Wikipedia DBOW, which is also in the…

python gensim word2vec doc2vec

asked May 20 '20 at 19:43

user13584534

votes

1 answer

How to extract sentences which has similar meaning/intent compared against a example list of sentences

I have chat interaction [Utterances] between Customer and Advisor and would want to know if the advisor interactions contains particular sentences or similar sentences in the below list: Example sentences i am looking for in the Advisor interactions…

python-3.x nlp gensim doc2vec sentence-similarity

asked Apr 26 '20 at 22:25

baskarmac

votes

1 answer

Gensim's Doc2Vec - How to use pre-trained word2vec (word similarities)

I don't have large corpus of data to train word similarities e.g. 'hot' is more similar to 'warm' than to 'cold'. However, I like to train doc2vec on a relatively small corpus ~100 docs so that it can classify my domain specific documents. To…

python nlp gensim doc2vec

asked Feb 18 '20 at 17:47

KGhatak

6,995
1
27
24

votes

1 answer

Gensim Doc2Vec infer_vector on unseen words differs based on characters in these words

Gensim Doc2Vec infer_vector on paragraphs with unseen words generates vectors that differ based on the characters in the unsween words. for i in range(0, 2): print(model.infer_vector(["zz"])[0:2]) print(model.infer_vector(["zzz"])[0:2]) …

gensim word2vec doc2vec

asked Dec 25 '19 at 22:01

Stanley Kirdey

votes

1 answer

Default values of doc2vec for alpha and min_alpha

can anybody tell me which default values are used in Doc2Vec() for alpha and min_alpha?

python scikit-learn gensim doc2vec hyperparameters

asked Oct 16 '19 at 11:58

Katharina Baur

votes

1 answer

How to use doc2vec model in production?

I wonder how to deploy a doc2vec model in production to create word vectors as input features to a classifier. To be specific, let say, a doc2vec model is trained on a corpus as follows. dataset['tagged_descriptions'] = datasetf.apply(lambda x:…

python nlp gensim doc2vec

asked Sep 23 '19 at 20:24

user3000538

votes

1 answer

How to use Sklearn linear regression with doc2vec input

I have 250k text documents (tweets and newspaper articles) represented as vectors obtained with a doc2vec model. Now, I want to use a regressor (multiple linear regression) to predict continuous value outputs - in my case the UK Consumer Confidence…

scikit-learn linear-regression gensim doc2vec

asked Aug 07 '19 at 09:59

Annka Kopka

votes

1 answer

How to combine vectors generated by PV-DM and PV-DBOW methods of doc2vec?

I have around 20k documents with 60 - 150 words. Out of these 20K documents, there are 400 documents for which the similar document are known. These 400 documents serve as my test data. I am trying to find similar documents for these 400 datasets…

python nlp gensim doc2vec sentence-similarity

asked Aug 06 '19 at 10:05

Vikrant

votes

2 answers

AttributeError: module 'gensim.utils' has no attribute 'smart_open'

I am building the vocabulary table using Doc2vec, but there is an error "AttributeError: module 'gensim.utils' has no attribute 'smart_open'". How do I solve this? This is for a notebook on Databricks platform, running in Python 3. In the past, I've…

python gensim databricks doc2vec

asked Jul 22 '19 at 14:38

idontreallyknowhehe

votes

1 answer

Tensorboard embedding visualization: what is cosine distance?

I'm PhD student in digital humanities. I'm quite new to programming languages. I have a problem that is freaking me out since last month. I'm trying to visualize a doc2vec model (python, gensim library) on the embeddings projector in Tensorboard but…

python data-visualization tensorboard cosine-similarity doc2vec

asked Jun 28 '19 at 09:52

Leonardo Sanna

votes

2 answers

Convert a column in a dask dataframe to a TaggedDocument for Doc2Vec

Intro Currently I am trying to use dask in concert with gensim to do NLP document computation and I'm running into an issue when converting my corpus into a "TaggedDocument". Because I've tried so many different ways to wrangle this problem I'll…

python dask gensim doc2vec

asked Jun 20 '19 at 07:38

ZdWhite

votes

1 answer

Where is word2vec mapping coming from for DBOW doc2vec in gensim implementation?

I am trying to use gensim for doc2vec and word2vec. Since PV-DM approach can generate word2vec and doc2vec at the same time, I thought PV-DM is the right model to use. So, I created a model using gensim by specifying dm=1 for PV-DM My questions are…

gensim word2vec doc2vec

asked Jun 06 '19 at 18:51

Brandon Lee

Prev 1 2 3

…

37 38 Next