Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Cannot align graph because multiple tag doc2vec returning more items in doctag_syn0 than there are in the training data

I am training a doc2vec model with multiple tags, so it includes the typical doc "ID" tag and then it also contains a label tag "Category 1." I'm trying to graph the results such that I get the doc distribution in a 2d (using LargeVis) but am able…

asked Oct 08 '18 at 17:07

seeiespi

3,628
2
35
37

votes

3 answers

Doc2Vec: Similarity Between Coded Documents and Unseen Documents

I have a sample of ~60,000 documents. We've hand coded 700 of them as having a certain type of content. Now we'd like to find the "most similar" documents to the 700 we already hand-coded. We're using gensim doc2vec and I can't quite figure out…

python nlp gensim word2vec doc2vec

asked Oct 07 '18 at 21:18

Academic Researcher

votes

1 answer

Why Gensim most similar in doc2vec gives the same vector as the output?

I am using the following code to get the ordered list of user posts. model = doc2vec.Doc2Vec.load(doc2vec_model_name) doc_vectors = model.docvecs.doctag_syn0 doc_tags = model.docvecs.offset2doctag for w, sim in…

nlp data-mining gensim word2vec doc2vec

asked Sep 24 '18 at 19:26

J Cena

votes

1 answer

I want to classify some sentences on the basis of their semantic meaning.How can I use Doc2Vec in this? Or is there a better approach than this?

I want to implement doc2vec on various reviews which we extracted from a source.And I want to classify these reviews into different classes defined by the user. How can I do this?

nlp semantics word2vec doc2vec

asked Sep 05 '18 at 05:46

Anirudh Chaudhary

votes

1 answer

MemoryError using Python and Doc2Vec

I'm trying to train a Doc2vec for massive data. I have a 20k files with 72GB in total, and write this code: def train(): onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] data = [] random.shuffle(onlyfiles) …

python machine-learning doc2vec

asked Aug 27 '18 at 12:14

Dimmy Magalhães

votes

1 answer

Using Doc2Vec to find salience score for resumes based on job description

Here is my use case: HR department provide job description(free text) and set of resumes(plain text), and the ask is to come up with salience score based on job description relevance. The job description consists of skills required and minimum…

nlp gensim doc2vec information-extraction

asked Aug 21 '18 at 23:30

Madhur Telang

votes

1 answer

Doc2vec - About getting document vector

I'm a very new student of doc2vec and have some questions about document vector. What I'm trying to get is a vector of phrase like 'cat-like mammal'. So, what I've tried so far is by using doc2vec pre-trained model, I tried the code below import…

word-embedding doc2vec

asked Aug 13 '18 at 07:24

Chhyun

votes

1 answer

Gensim tagging documents with big numbers

I want to label my documents with tags mapped to id attribute in database. The ids can be for example also like this: documents[0] is for example TaggedDocument(words=['blabla', 'request'], tags=[225616076]) For some reason, it is not able to…

python gensim topic-modeling doc2vec

asked Jul 30 '18 at 12:27

xdaniel

votes

0 answers

Sent2Vec or Doc2Vec Testing

How can i test a sent2vec or doc2vec model that I've trained on a specific dataset? The process is all unsupervised so have no labels to help in the testing. My interest is in how the semantic similarity measure is computed. Thanks in advance.

doc2vec

asked Jul 26 '18 at 10:26

Hummer

votes

1 answer

AttributeError: 'Tree' object has no attribute 'words'. Doc2Vec error

I am trying to train a Doc2Vec word embedding on preprocessed paragraphs. I have removed punctuation, and have carried out tokenization, pos tag and chunking. import nltk from nltk import word_tokenize, pos_tag, ne_chunk from gensim.models.doc2vec…

model nltk gensim attributeerror doc2vec

asked Jul 20 '18 at 08:28

Nuc

votes

1 answer

How to check via callbacks if alpha is decreasing? + How to load all cores during training?

I'm training doc2vec, and using callbacks trying to see if alpha is decreasing over training time using this code: class EpochSaver(CallbackAny2Vec): '''Callback to save model after each epoch.''' def __init__(self, path_prefix): …

callback gensim multicore word-embedding doc2vec

asked Jul 19 '18 at 08:45

Dasha

votes

1 answer

Gensim Doc2vec trained, but not saved

While I trained d2v on a large text corpus I received these 3 files: doc2vec.model.trainables.syn1neg.npy doc2vec.model.vocabulary.cum_table.npy doc2vec.model.wv.vectors.npy Bun final model has not saved, because there was not enough free space…

model save gensim word-embedding doc2vec

asked Jul 17 '18 at 08:05

Dasha

votes

0 answers

Cannot figure out format needed to make predictions on dataset trained with doc2vec and random forest classifier

I am trying to make predictions on a dataset based on some pre-defined data (tweets and categories that the tweets belong to, labeled 1-16) that I have built a model in with doc2vec and trained on random forest classifier. I am confused about what…

python machine-learning random-forest doc2vec

asked Jul 14 '18 at 17:40

Natalie

votes

1 answer

Doc2Vec gensim with supervised data predefined labels

I am trying to use gensim's doc2vec to create a model which will be trained on a set of documents and a set of labels. The labels were created manually and need to be put into the program to be trained on. So far I have 2 lists: a list of sentences,…

python gensim supervised-learning doc2vec

asked Jul 09 '18 at 18:57

Natalie

votes

1 answer

How to I get the similiarity between a word to a document in gensim

So I have started to learn gensim for both word2vec and doc2vec and it works. The similarity scores actually work really well. For an experiment, however, I wanted to optimize a key word based search algorithm by comparing a single word and getting…

python search gensim word2vec doc2vec

asked Jul 04 '18 at 21:18

Julian Kurz

Prev 1 2 3

…

37 38 Next