Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

2 answers

AttributeError: 'Word2Vec' object has no attribute 'most_similar' (Word2Vec)

I am using Word2Vec and using a wiki trained model that gives out the most similar words. I ran this before and it worked but now it gives me this error even after rerunning the whole program. I tried to take off return_path=True but im still…

asked Aug 06 '21 at 05:41

RSB

votes

0 answers

How to get document embeddings using GPT-2?

I'm curious if using GPT-2 might yield a higher accuracy for document vectors (with greatly varying length) or not (would it surpass the state of the art?) Really I'm most interested in document embeddings that are as accurate as possible. I'm…

python machine-learning nlp artificial-intelligence doc2vec

asked May 06 '20 at 17:19

Youssef A

votes

1 answer

Which method dm or dbow works well for document similarity using Doc2Vec?

I'm trying to find out the similarity between 2 documents. I'm using Doc2vec Gensim to train around 10k documents. There are around 10 string type of tags. Each tag consists of a unique word and contains some sort of documents. Model is trained…

python-3.x gensim similarity doc2vec

asked May 27 '19 at 09:34

iNikkz

3,729
5
29
59

votes

1 answer

Gensim Doc2Vec generating huge file for model

I am trying to run doc2vec library from gensim package. My problem is that when I am training and saving the model the model file is rather large(2.5 GB) I tried using this line : model.estimate_memory() But it didn't change anything. I also have…

python semantics gensim word2vec doc2vec

asked Jul 19 '17 at 15:37

ida

1,011
1
9
17

votes

2 answers

Doc2Vec Sentence Clustering

I have multiple documents that contain multiple sentences. I want to use doc2vec to cluster (e.g. k-means) the sentence vectors by using sklearn. As such, the idea is that similar sentences are grouped together in several clusters. However, it is…

python scikit-learn text-mining gensim doc2vec

asked Apr 18 '17 at 15:53

Boyos123

votes

1 answer

What is gensim's 'docvecs'?

The above picture is from Distributed Representations of Sentences and Documents, the paper introducing Doc2Vec. I am using Gensim's implementation of Word2Vec and Doc2Vec, which are great, but I am looking for clarity on a few issues. For a given…

python nlp gensim doc2vec

asked Jan 18 '17 at 00:15

Michael Davidson

1,391
1
14
31

votes

0 answers

How to train a new text with gensim doc2vec

sentences=gensim.models.doc2vec.TaggedLineDocument("raw_docs.txt") model=gensim.models.Doc2Vec(sentences,min_count=1,iter=100) sentence=TaggedDocument(words=[u'为了'],tags=[u'T1']) sentences1=[sentence] model.build_vocab(sentences1,update=True) model.t…

gensim doc2vec

asked Jan 03 '17 at 09:50

Jeffery

votes

2 answers

doc2vec - How to infer vectors of documents faster?

I have trained paragraph vectors for around 2300 paragraphs(between 2000-12000 words each) each with vector size of 300. Now, I need to infer paragraph vectors of around 100,000 sentences which I have considered as paragraphs(each sentence is around…

python gensim word2vec doc2vec

asked Sep 19 '16 at 18:57

Dreams

5,854
9
48
71

votes

1 answer

How to get the Document Vector from Doc2Vec in gensim 0.11.1?

Is there a way to get the document vectors of unseen and seen documents from Doc2Vec in the gensim 0.11.1 version? For example, suppose I trained the model on 1000 thousand - Can I get the doc vector for those 1000 docs? Is there a way to get…

python gensim word2vec doc2vec

asked Jun 11 '16 at 12:45

silent_dev

1,566
3
20
45

votes

2 answers

Gensim Doc2Vec visualization issue when using t-SNE and/or PCA

I am trying to familiarize with Doc2Vec results by using a public dataset of movie reviews. I have cleaned the data and run the model. There are, as you can see below, 6 tags/genres. Each is a document with its vector representation. doc_tags =…

python machine-learning scatter-plot cosine-similarity doc2vec

asked Aug 14 '20 at 12:10

Rameau

votes

2 answers

ModuleNotFoundError: No module named 'numpy.random._pickle'

I have a doc2vec model which drives my recommendation app. I have built the doc2vec model and saved into s3 bucket. Now when i open the webapp the model should be loaded back from s3 but this not happenning. I used AWS Elasticbean stalk to deploy my…

python-3.x numpy pickle joblib doc2vec

asked May 26 '20 at 07:11

Praneeth Sai

1,421
2
7
11

votes

1 answer

Use Spacy to find most similar sentences in doc

I'm looking for a solution to use something like most_similar() from Gensim but using Spacy. I want to find the most similar sentence in a list of sentences using NLP. I tried to use similarity() from Spacy (e.g. https://spacy.io/api/doc#similarity)…

gensim similarity spacy doc2vec sentence-similarity

asked May 15 '19 at 13:33

Heraknos

votes

2 answers

What is the appropriate distance metric when clustering paragraph/doc2vec vectors?

My intent is to cluster document vectors from doc2vec using HDBSCAN. I want to find tiny clusters where there are semantical and textual duplicates. To do this I am using gensim to generate document vectors. The elements of the resulting docvecs are…

python cluster-analysis distance doc2vec hdbscan

asked Oct 09 '18 at 13:35

fluffet

votes

1 answer

gensim - Doc2Vec: Difference iter vs. epochs

When reading the Doc2Vec documentation of gensim, I get a bit confused about some options. For example, the constructor of Doc2Vec has a parameter iter: iter (int) – Number of iterations (epochs) over the corpus. Why does the train method then…

python gensim doc2vec

asked May 17 '18 at 11:41

Simon Hessner

1,757
1
22
49

votes

2 answers

Do gensim Doc2Vec distinguish between same Sentence with positive and negative context.?

While learning Doc2Vec library, I got stuck on the following question. Do gensim Doc2Vec distinguish between the same Sentence with positive and negative context? For Example: Sentence A: "I love Machine Learning" Sentence B: "I do not love Machine…

python nlp gensim doc2vec

asked Apr 26 '18 at 08:31

DK818

Prev 1 2

…

37 38 Next