Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Document similarity in production environment

We are having n number of documents. Upon submission of new document by user, our goal is to inform him about possible duplication of existing document (just like stackoverflow suggests questions may already have answer). In our system, new document…

asked May 10 '18 at 01:53

user2578525

votes

1 answer

Gensim Doc2Vec most_similar() method not working as expected

I am struggling with Doc2Vec and I cannot see what I am doing wrong. I have a text file with sentences. I want to know, for a given sentence, what is the closest sentence we can find in that file. Here is the code for model creation: sentences =…

python nlp gensim doc2vec sentence-similarity

asked Apr 03 '18 at 13:47

Yann Droy

votes

1 answer

Gensim's Doc2vec - inferred vector isn't similar

When I train Doc2vec (using Gensim's Doc2vec in Python) on corpus of about 10k documents (each has few hundred words) and then infer document vectors using the same documents, they are not at all similar to the trained document vectors. I would…

python gensim doc2vec

asked Mar 07 '18 at 15:19

awa993

votes

0 answers

Embedding Gensim Doc2Vec Tensorboard

I have a set of documents in a df. I am transforming those documents to vectors with gensim Doc2Vec: def read_corpus(documents): for i, plot in enumerate(documents): yield…

python tensorflow gensim tensorboard doc2vec

asked Feb 22 '18 at 13:33

OverflowingTheGlass

2,324
1
27
75

votes

1 answer

how to use build_vocab in gensim?

Build_vocab extend my old vocabulary? For example, my idea is when I use doc2vec(s) to train a model, it just builds the vocabulary from the datasets. If I want to extend it, I need to use build_vocab() Where should I use it? Should I put it…

nlp word2vec gensim doc2vec

asked Feb 09 '18 at 09:53

Cherrymelon

votes

1 answer

Updating training documents for gensim Doc2Vec model

I have an existing gensim Doc2Vec model, and I'm trying to do iterative updates to the training set, and by extension, the model. I take the new documents, and perform preproecssing as normal: stoplist =…

gensim doc2vec

asked Dec 12 '17 at 14:56

Brian O'Halloran

votes

1 answer

What are doc2vec training iterations?

I am new to doc2vec. I was initially trying to understand doc2vec and mentioned below is my code that uses Gensim. As I want I get a trained model and document vectors for the two documents. However, I would like to know the benefits of retraining…

python deep-learning word2vec gensim doc2vec

asked Oct 18 '17 at 09:33

user8566323

votes

2 answers

User2Vec? representing a user based on the docs they consume

I'd like to form a representation of users based on the last N documents they have liked. So i'm planning on using doc2vec to form this representation of each document but i'm just trying to figure out what would be a good way to essentially place…

neural-network word2vec doc2vec

asked Sep 26 '17 at 12:18

andrewm4894

1,451
4
17
37

votes

1 answer

Why are almost all cosine similarities positive between word or document vectors in gensim doc2vec?

I have calculated document similarities using Doc2Vec.docvecs.similarity() in gensim. Now, I would either expect the cosine similarities to lie in the range [0.0, 1.0] if gensim used the absolute value of the cosine as the similarity metric, or…

python gensim word2vec doc2vec

asked Jun 03 '17 at 15:29

Sami Liedes

1,084
8
19

votes

2 answers

Gensim docvecs.most_similar returns Id's that dont exist

I'm trying create an algorithm that's capable of show the top n documents similar to a specific document. For that i used the gensim doc2vec. The code is bellow: model = gensim.models.doc2vec.Doc2Vec(size=400, window=8, min_count=5, workers = 11,…

python gensim doc2vec

asked Mar 27 '17 at 16:37

JoaoSilva

votes

2 answers

load pre-trained word2vec model for doc2vec

I'm using gensim to extract feature vector from a document. I've downloaded the pre-trained model from Google named GoogleNews-vectors-negative300.bin and I loaded that model using the following command: model =…

machine-learning nlp gensim word2vec doc2vec

asked Feb 08 '17 at 16:58

lenhhoxung

2,530
2
30
61

votes

1 answer

Doc2Vec model Python 3 compatibility

I trained a doc2vec model with Python2 and I would like to use it in Python3. When I try to load it in Python 3, I get : Doc2Vec.load('my_doc2vec.pkl') UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in…

python python-3.x pickle gensim doc2vec

asked Jul 20 '16 at 14:31

Bernard

votes

0 answers

Is there any way to validate the performance of a Doc2Vec/ Word2Vec Deep Learning model?

I am working with the Doc2Vec and Word2Vec deep learning algorithms (Doc2Vec API description from Gensim). More description here Currently I am interested in using the model.n_similarity(wordSet1, wordSet2) method which basically computes the …

python deep-learning gensim word2vec doc2vec

asked Jun 27 '16 at 20:05

Uther Pendragon

votes

2 answers

doc2vec infer words from vectors

I am clustering comments. After preprocessing and a vectorization of a text, I have inferred vectors from my doc2vec model and applied kmeans. After that I want to convert cluster centroid vectors to words to kinda look at the semantic cores of the…

doc2vec

asked May 16 '22 at 14:22

frogseer

votes

1 answer

Run model that need gensim older vesion

I need to run a model but it needs older version of gensim with DocvecsArray attribute.How can i run it? AttributeError: Can't get attribute 'DocvecsArray' on

python nlp gensim word2vec doc2vec

asked Feb 13 '22 at 15:09

Ayshath Thasmiya

Prev 1 2 3

…

37 38 Next