Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

gensim doc2vec, why the order of the sentences affects the doc2vec vector

when I use model.infer_vector to compute the vectors, differ order of document results different. size=200;negative=15; min_count=1;iterNum=20; windows = 5 modelName = "datasets/dm-sum.bin_"+str(windows)+"_"…

gensim doc2vec

asked Nov 15 '17 at 09:36

eli Yi

votes

1 answer

Getting numpy vector from a trained Doc2Vec model for each document

This is my first time using Doc2Vec I'm trying to classify works of an author. I have trained a model with Labeled Sentences (paragraphs, or strings of specified length), with words = the list of words in the paragraph, and tags = author's name. In…

python-3.x nlp gensim doc2vec

asked Nov 07 '17 at 01:36

Eric Han

votes

1 answer

Optimizing gensim(C compilier and BLAS) in Window 7

I wants to optimize gensim to run doc2vec in Window7 [1] C compiler I installed gensim by following this instruction: https://radimrehurek.com/gensim/install.html pip install --upgrade gensim However, in this…

python-2.7 word2vec gensim blas doc2vec

asked Oct 31 '17 at 14:01

Kjyong

votes

1 answer

python error:" 'numpy.ndarray' object has no attribute 'words' " when training doc2vec

when I trained my doc2vec model, I passed through the dataset multiple times and shuffled the training reviews each time to improve accuracy. Then python gave me the AttributeError: 'numpy.ndarray' object has no attribute 'words'.Following is my…

python numpy doc2vec

asked Oct 15 '17 at 06:17

Vera

votes

1 answer

Gensim doc2vec sentence tagging

Im trying to understand doc2vec and can I use it to solve my scenario. I want to label sentences with 1 or more tags using TaggedSentences([words], [tags]), but im unsure If my understanding is correct. so basically, i need this to happen(or am I…

python machine-learning data-science gensim doc2vec

asked Oct 10 '17 at 19:37

rogger2016

votes

1 answer

How word2Vec or wod2Doc understand user sentiments

I have gone through numerous documents to read about doc2Vec and word2Vec. I do understand how powerful it is to represent the words as a vector and to perform simple operations like vector addition , subtraction to yield meaningful analogy between…

nlp word2vec doc2vec

asked Sep 19 '17 at 11:36

user1845926

votes

1 answer

Identify the dimensions in doc2vec model

I have created a doc2vec model of size of 100 dimensions. From what I understand from my reading that these dimensions are features of my model. How can I identify what these dimensions are exactly.

python gensim doc2vec

asked Sep 11 '17 at 14:12

Y0gesh Gupta

2,184
5
40
56

votes

2 answers

How to find most similar terms/words of a document in doc2vec?

I have applied Doc2vec to convert documents into vectors.After that, I used the vectors in clustering and figured out the 5 nearest/most similar document to the centroid of each cluster. Now I need to find the most dominant or important terms of…

python cluster-analysis gensim word2vec doc2vec

asked Sep 05 '17 at 05:23

pankaj jha

votes

3 answers

How to intrepret Clusters results after using Doc2vec?

I am using doc2vec to convert the top 100 tweets of my followers in vector representation (say v1.....v100). After that I am using the vector representation to do the K-Means clusters. model = Doc2Vec(documents=t, size=100, alpha=.035, window=10,…

python scikit-learn cluster-analysis gensim doc2vec

asked Aug 28 '17 at 11:31

pankaj jha

votes

0 answers

Pickel Error while storing Doc2vec gensim model

I am trying to save gensim Doc2vec model. The model is trained on 9M document vectors and vocabulary of around 1M words. But I am getting pickel error. "top" shows that the program uses around 13GB of RAM. Also I think since I need to re-train the…

nlp pickle gensim doc2vec

asked Aug 20 '17 at 15:25

maggs

votes

1 answer

Agglomerative Clustering to cluster doc2vec

I'm new to Agglomerative Clustering and doc2vec, so I hope somebody can help me with the following issue. This is my code: model = AgglomerativeClustering(linkage='average', connectivity=None, n_clusters=2) X =…

python scikit-learn hierarchical-clustering doc2vec

asked Aug 10 '17 at 09:39

user8400385

votes

1 answer

Gensim Doc2Vec model only generates a limited number of vectors

I am using gensim Doc2Vec model to generate my feature vectors. Here is the code I am using (I have explained what my problem is in the code): cores = multiprocessing.cpu_count() # creating a list of tagged documents training_docs = [] # all_docs:…

python nlp gensim doc2vec

asked Aug 02 '17 at 17:46

Pedram

2,421
4
31
49

votes

1 answer

Why is Doc2Vec.scale_vocab(...)['memory']['vocab'] divided by 700 to obtain vocabulary size?

From the Doc2Vec wikipedia tutorial at https://github.com/RaRe-Technologies/gensim/blob/master/docs/notebooks/doc2vec-wikipedia.ipynb for num in range(0, 20): print('min_count: {}, size of vocab: '.format(num), …

gensim doc2vec

asked Jul 31 '17 at 14:48

Thomas Fauskanger

2,536
1
27
42

votes

1 answer

Identifying product names from a column of line text using doc2vec

I have a column of line texts. From the column of line texts I would l names which are similar to a list of product names. I was using Doc2Vec to solve the problem. But my result has been pretty bad. Which is the right approach for this problem? My…

doc2vec

asked Jul 19 '17 at 11:10

anirudh

votes

1 answer

Training a network to find similar bodies of text

I have multiple text files and I am trying to find a way to identify similar bodies of text. The files themselves consist of an "average" sized paragraph. On top of this I also have some data that could be used as lables for the data if I were to go…

nlp nltk gensim spacy doc2vec

asked Jun 30 '17 at 14:56

Kieran Lavelle

Prev 1 2 3

…

37 38 Next