Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Group by and aggregate problems for numpy arrays over word vectors

My pandas data frame looks something like this: Movieid review movieRating wordEmbeddingVector 1 "text" 4 [100 dimensional vector] I am trying to run a doc2vec implementation and I want to be able to group by movie ids and…

asked Jun 02 '16 at 18:33

Roshini

-1

votes

1 answer

What would be the best way to compare different parts of a document in just one doc2vec embedding?

Let's say I have many documents with a question and an answer. I want to build an embedding where I can find the most similar documents based on just a new question without an answer but still be able to find similar documents based on the whole…

python machine-learning nlp doc2vec

asked Mar 12 '23 at 13:34

Red Boraley

-1

votes

1 answer

Reverse TF-IDF vector (vec2text)

Given a generated doc2vec vector on some document. is it possible to reverse the vector back to the original document? If so, does there exist any hash algorithm that would make the vector irreversible but still comparable to other vectors of the…

hash data-science tf-idf doc2vec lsh

asked Aug 28 '22 at 09:45

first_question_magnus

-1

votes

1 answer

Tokenization of unbalanced dataset

I'm working with a dataset of emails' content which I want to transform with doc2vec. This is a labeled dataset (spam/not-spam) and it is unbalanced (90-10 ratio). My question is: when tokenizing the emails' content, should I first oversample (using…

machine-learning nlp doc2vec imbalanced-data smote

asked Jan 07 '21 at 10:31

Efrat Magidov

-1

votes

1 answer

Why doc2vec is giving different and un-reliable results?

I have a set of 20 small document which talks about a particular kind of issue (training data). Now i want to identify those docs out of 10K documents, which are talking about the same issue. For the purpose i am using the doc2vec…

machine-learning nlp gensim similarity doc2vec

asked Jul 08 '20 at 14:48

Shivam Agrawal

2,053
4
26
42

-1

votes

1 answer

Is there anyway to train doc2vec model in multiples batches

i don't know how to train model in multiples batches with doc2vec . Since i load all my data in ram and it't can not be loaded #Import all the dependencies from gensim.models.doc2vec import Doc2Vec, TaggedDocument import…

gensim doc2vec

asked Jun 01 '20 at 03:49

Luong Hoang

-1

votes

1 answer

How to do supervised learning with Gensim/Word2Vec/Doc2Vec having large corpus of text documents?

I have a set of text documents(2000+) with labels (Liked/Disliked).Each document consists of 200+ words. I am trying to do a supervised learning with these documents. My approach would be: Vectorize each document in the corpus. Say we have 2347…

python nlp gensim word2vec doc2vec

asked Jan 24 '20 at 06:05

afghani

-1

votes

1 answer

how to get words of clusters

How can I get the words of each cluster I divided them into groups LabeledSentence1 = gensim.models.doc2vec.TaggedDocument all_content_train = [] j=0 for em in train['KARMA'].values: …

python k-means doc2vec

asked Sep 23 '19 at 12:04

N.K

-1

votes

1 answer

how to approach the project which is about analyzing call records and getting meaningful results about the topic

I am analyzing the call records and try to use doc2vec I cant find the appropriate way to apply I tried to convert words to root later i will try to get rid of stop words(which are rooted). I desire to understand that each what the conversation is…

python nlp nltk word2vec doc2vec

asked Aug 16 '19 at 12:26

N.K

-1

votes

1 answer

Get all similar documents with doc2vec

I am actually working with doc2vec from gensim library and I want to get all similarities with probabilites not only the top 10 similarities provided by model.docvecs.most_similar() Once my model is trained In [1]: print(model) Out [1]:…

python gensim doc2vec

asked May 07 '19 at 11:24

Oussama Jabri

-1

votes

1 answer

Computing a similarity score for a set of sentences

My team does a lot of chatbot training, and I'm trying to come up with some tools to improve the quality of our work. In chatbot training, it is really important to train intents with diverse utterances that phrase the same intent in very different…

machine-learning nlp word2vec doc2vec sentence-similarity

asked Jan 25 '19 at 23:14

SymphonyTomorrow

-1

votes

2 answers

How doc2vec creates vector for sentence

I am working on Doc2vec for text classification. It is creating a vector for a sentence with some given size (e.g.: 100, length of vector). I am not able to understand how it creates vector of that length. i am following this link. in here they are…

python machine-learning data-science word2vec doc2vec

asked Oct 31 '18 at 03:47

Naveen Meka

-2

votes

2 answers

How do I input doc2vec vectors of multiple text columns?

I have a dataset which has 3 different columns of relevant text information which I want to convert into doc2vec vectors and subsequently classify using a neural net. My question is how do I convert these three columns into vectors and input into a…

python machine-learning nlp doc2vec

asked Mar 26 '19 at 17:42

anmol narang

-3

votes

0 answers

Comparing Similarity Between Two Texts with Doc2Vec

I'm working on a Machine Learning project. I have some user data from an e-commerce website and I'm predicting future purchases. Actually my model is complete but I want to add a new feature to my dataframe. I haven't used search terms data of users…

python machine-learning nlp word2vec doc2vec

asked Aug 14 '23 at 17:55

XPrime

-3

votes

1 answer

How to find similarity between two list of strings using doc2vec?

I have a list of strings like below. I would like to see similarity between list1 and list2 using Doc2Vec. list1 = [['i','love','machine','learning','its','awesome'],['i', 'love', 'coding', 'in', 'python'],['i', 'love', 'building',…

python python-3.x nlp doc2vec

asked May 27 '19 at 13:04

Praveenkumar

Prev 1 2 3

…

38 Next