Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions
0
votes
1 answer

gensim doc2vec, why the order of the sentences affects the doc2vec vector

when I use model.infer_vector to compute the vectors, differ order of document results different. size=200;negative=15; min_count=1;iterNum=20; windows = 5 modelName = "datasets/dm-sum.bin_"+str(windows)+"_"…
eli Yi
  • 1
  • 2
0
votes
1 answer

Getting numpy vector from a trained Doc2Vec model for each document

This is my first time using Doc2Vec I'm trying to classify works of an author. I have trained a model with Labeled Sentences (paragraphs, or strings of specified length), with words = the list of words in the paragraph, and tags = author's name. In…
Eric Han
  • 45
  • 2
  • 10
0
votes
1 answer

Optimizing gensim(C compilier and BLAS) in Window 7

I wants to optimize gensim to run doc2vec in Window7 [1] C compiler I installed gensim by following this instruction: https://radimrehurek.com/gensim/install.html pip install --upgrade gensim However, in this…
Kjyong
  • 195
  • 1
  • 8
0
votes
1 answer

python error:" 'numpy.ndarray' object has no attribute 'words' " when training doc2vec

when I trained my doc2vec model, I passed through the dataset multiple times and shuffled the training reviews each time to improve accuracy. Then python gave me the AttributeError: 'numpy.ndarray' object has no attribute 'words'.Following is my…
Vera
  • 75
  • 2
  • 6
0
votes
1 answer

Gensim doc2vec sentence tagging

Im trying to understand doc2vec and can I use it to solve my scenario. I want to label sentences with 1 or more tags using TaggedSentences([words], [tags]), but im unsure If my understanding is correct. so basically, i need this to happen(or am I…
rogger2016
  • 821
  • 3
  • 11
  • 28
0
votes
1 answer

How word2Vec or wod2Doc understand user sentiments

I have gone through numerous documents to read about doc2Vec and word2Vec. I do understand how powerful it is to represent the words as a vector and to perform simple operations like vector addition , subtraction to yield meaningful analogy between…
user1845926
0
votes
1 answer

Identify the dimensions in doc2vec model

I have created a doc2vec model of size of 100 dimensions. From what I understand from my reading that these dimensions are features of my model. How can I identify what these dimensions are exactly.
Y0gesh Gupta
  • 2,184
  • 5
  • 40
  • 56
0
votes
2 answers

How to find most similar terms/words of a document in doc2vec?

I have applied Doc2vec to convert documents into vectors.After that, I used the vectors in clustering and figured out the 5 nearest/most similar document to the centroid of each cluster. Now I need to find the most dominant or important terms of…
pankaj jha
  • 299
  • 5
  • 15
0
votes
3 answers

How to intrepret Clusters results after using Doc2vec?

I am using doc2vec to convert the top 100 tweets of my followers in vector representation (say v1.....v100). After that I am using the vector representation to do the K-Means clusters. model = Doc2Vec(documents=t, size=100, alpha=.035, window=10,…
pankaj jha
  • 299
  • 5
  • 15
0
votes
0 answers

Pickel Error while storing Doc2vec gensim model

I am trying to save gensim Doc2vec model. The model is trained on 9M document vectors and vocabulary of around 1M words. But I am getting pickel error. "top" shows that the program uses around 13GB of RAM. Also I think since I need to re-train the…
maggs
  • 763
  • 2
  • 9
  • 15
0
votes
1 answer

Agglomerative Clustering to cluster doc2vec

I'm new to Agglomerative Clustering and doc2vec, so I hope somebody can help me with the following issue. This is my code: model = AgglomerativeClustering(linkage='average', connectivity=None, n_clusters=2) X =…
user8400385
0
votes
1 answer

Gensim Doc2Vec model only generates a limited number of vectors

I am using gensim Doc2Vec model to generate my feature vectors. Here is the code I am using (I have explained what my problem is in the code): cores = multiprocessing.cpu_count() # creating a list of tagged documents training_docs = [] # all_docs:…
Pedram
  • 2,421
  • 4
  • 31
  • 49
0
votes
1 answer

Why is Doc2Vec.scale_vocab(...)['memory']['vocab'] divided by 700 to obtain vocabulary size?

From the Doc2Vec wikipedia tutorial at https://github.com/RaRe-Technologies/gensim/blob/master/docs/notebooks/doc2vec-wikipedia.ipynb for num in range(0, 20): print('min_count: {}, size of vocab: '.format(num), …
Thomas Fauskanger
  • 2,536
  • 1
  • 27
  • 42
0
votes
1 answer

Identifying product names from a column of line text using doc2vec

I have a column of line texts. From the column of line texts I would l names which are similar to a list of product names. I was using Doc2Vec to solve the problem. But my result has been pretty bad. Which is the right approach for this problem? My…
anirudh
  • 1
  • 2
0
votes
1 answer

Training a network to find similar bodies of text

I have multiple text files and I am trying to find a way to identify similar bodies of text. The files themselves consist of an "average" sized paragraph. On top of this I also have some data that could be used as lables for the data if I were to go…