Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

'Doc2Vec' object has no attribute 'syn0'

import gensim from gensim.models.doc2vec import TaggedDocument taggeddocs = [] tag2tweetmap = {} for index,i in enumerate(cleaned_tweets): if len(i) > 2: # Non empty tweets tag = u'SENT_{:d}'.format(index) sentence =…

doc2vec

asked Mar 04 '18 at 07:02

Dylan

votes

1 answer

Can i build vocaburay in twice with gensim word2vec or doc2vec?

I have two different corpus and what i want is to train the model with both and to do it it I thought that it could be something like this: model.build_vocab(sentencesCorpus1) model.build_vocab(sentencesCorpus2) Would it be right?

asked Feb 22 '18 at 17:53

Mikel Laburu

votes

1 answer

add new vocabulary to existing Doc2vec model

I Already have a Doc2Vec model. I have trained it with my train data. Now after a while I want to use Doc2Vec for my test data. I want to add my test data vocabulary to my existing model's vocabulary. How can I do this? I mean how can I update my…

word2vec gensim doc2vec

asked Feb 16 '18 at 07:21

Sina Akhavan

votes

1 answer

Gensim Doc2Vec Most_Similar

I'm having trouble with the most_similar method in Gensim's Doc2Vec model. When I run most_similar I only get the similarity of the first 10 tagged documents (based on their tags-always from 0-9). For this code I have topn=5, but I've used…

python nlp deep-learning gensim doc2vec

asked Feb 11 '18 at 16:49

J. Collins

votes

1 answer

Does Mikolov 2014 Paragraph2Vec models assume sentence ordering?

In Mikolov 2014 paper regarding paragraph2Vectors, https://arxiv.org/pdf/1405.4053v2.pdf, do the authors assume in both PV-DM and PV-DBOW, the ordering of sentences need to make sense? Imagine I am handling a stream of tweets, and each tweet is a…

word2vec doc2vec sentence-similarity

asked Feb 09 '18 at 19:40

Franklin Dong

votes

1 answer

Concatenating two doc2vec models: Vector dimensions doubled

I have a question regarding concatenating two doc2vec models. I followed the official gensim IMDB example on doc2vec and implemented example data. When concatenating two models (PV-DM + PV-DBOW), as outlined in the original paper, I wondered that…

machine-learning concatenation word2vec gensim doc2vec

asked Feb 08 '18 at 10:18

Christopher

2,120
7
31
58

votes

1 answer

load Doc2Vec model and get new sentence's vectors for test

I have read lots of examples regarding doc2vec, but I couldn't find any answer. Like a real example, I want to build a model with doc2vec and then train it with some ML models. after that, how can I get the vector of a raw string with the exact…

nlp word2vec gensim doc2vec

asked Feb 06 '18 at 04:09

Sina Akhavan

votes

2 answers

ELKI Kmeans clustering Task failed error for high dimensional data

I have a 60000 documents which i processed in gensim and got a 60000*300 matrix. I exported this as a csv file. When i import this in ELKI environment and run Kmeans clustering, i am getting below error. Task…

cluster-analysis k-means gensim doc2vec elki

asked Feb 05 '18 at 13:01

StatguyUser

2,595
2
22
45

votes

1 answer

gensim: Retrieving word frequency in doc2vec vocabulary

I just came across this StackOverflow post on word counts in a doc2vec model vocabulary. I wonder if there is another method to retrieve the word frequency, other than for word, vocab_obj in model.wv.vocab.items(): print(str(word) +…

dictionary word2vec gensim doc2vec vocabulary

asked Jan 29 '18 at 18:05

Christopher

2,120
7
31
58

votes

1 answer

Are the document vectors used in doc2vec one-hot?

I understand conceptually how word2vec and doc2vec work, but am struggling with the nuts and bolts of how the numbers in the vectors get processed algorithmically. If the vectors for three context words are: [1000], [0100], [0010] and the vector for…

python nlp word2vec doc2vec

asked Jan 17 '18 at 21:08

mudstick

votes

1 answer

doc2vec: any way to fetch closest matching terms for a given vector?

The use-case I have is to have a collection of "upvoted" documents and "downvoted" documents and using those to re-order a set of results in a search. I am using gensim doc2vec and am able to run the most_similar queries for word(s) and fetch…

word2vec gensim doc2vec

asked Jan 15 '18 at 17:10

Santino

votes

1 answer

Which way to recover doc2vec model more efficient?

After I train a doc2vec model, I want to reuse the document vectors in another module. It seems there are two ways to implement this: save the model and save doc-vectors as a dictionary. I just wonder which one is more memory-efficient and which one…

word2vec doc2vec

asked Dec 21 '17 at 02:54

YangGuo

votes

1 answer

Parameter values of Doc2vec for Document Tagging - Gensim

my task is to assign tags (descriptive words) to documents or posts from the list of available tags. I'm working with Doc2vec available in Gensim. I read that doc2vec can be used for document tagging. But i could not get the suitable parameter…

python gensim doc2vec

asked Dec 13 '17 at 18:13

Rabia

votes

1 answer

getting vector-tags pair after training in words2vec

I am trying to convert a bunch of poems into vectors, and then use my own implementation of k-means on them, but I can't figure out how to get the vectors with tags attached after training in doc2vec. I also find that when I train on 11 files I get…

python parsing doc2vec

asked Dec 03 '17 at 22:09

tharvey

votes

1 answer

Issues in doc2vec tags in Gensim

I am using gensim doc2vec as below. from gensim.models import doc2vec from collections import namedtuple import re my_d = {'recipe__001__1': 'recipe 1 details should come here', 'recipe__001__2': 'Ingredients of recipe 2 need to be added'} docs =…

python gensim doc2vec

asked Nov 16 '17 at 14:28

user8566323

Prev 1 2 3

…

37 38 Next