Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

gensim doc2vec train more documents from pre-trained model

I am trying to train with new labelled document(TaggedDocument) with the pre-trained model. Pretrained model is the trained model with documents which the unique id with label1_index, for instance, Good_0, Good_1 to Good_999 And the total size of…

asked Feb 21 '18 at 04:45

Isaac Sim

votes

1 answer

doc2vec/gensim - issue with shuffling sentences in the epochs

I am trying to get started with word2vec and doc2vec using the excellent tutorials, here and here and trying to use the code samples. I only added in a line_clean() method to remove punctuation, stopwords etc. But I am having trouble with the…

python word2vec gensim doc2vec

asked Dec 31 '17 at 17:55

Santino

votes

1 answer

Doc2vec: clustering resulting vectors

In the doc2vec model, Can we cluster on the vectors themselves? Should we cluster each resulting model.docvecs[1]vector? How to implement the clustering model? model = gensim.models.doc2vec.Doc2Vec(size= 100, min_count = 5,window=4, iter = 50,…

python nlp gensim doc2vec

asked Dec 21 '17 at 18:34

Hackerds

1,195
2
16
34

votes

1 answer

Doc2vec: model.docvecs is only of length 10

I am trying doc2vec for 600000 rows of sentences and my code is below: model = gensim.models.doc2vec.Doc2Vec(size= 100, min_count = 5,window=4, iter = 50, workers=cores) model.build_vocab(res) model.train(res, total_examples=model.corpus_count,…

python nlp gensim doc2vec

asked Dec 21 '17 at 16:27

Hackerds

1,195
2
16
34

votes

2 answers

How to use Gensim Doc2vec infer_vector() for large DataFrame?

I have created document vectors for a large corpus using Gensim's doc2vec. sentences=gensim.models.doc2vec.TaggedLineDocument('file.csv') model = gensim.models.doc2vec.Doc2Vec(sentences,size = 10, window = 800, min_count = 1, workers=40, iter=10,…

python gensim doc2vec

asked Dec 20 '17 at 11:59

CMM

votes

2 answers

How to access document details from Doc2Vec similarity scores in gensim model?

I have been given a doc2vec model using gensim which was trained on 20 Million documents. The 20 Million documents it was trained are also given to me but I have no idea how or which order the documents were trained in from the folder. I am supposed…

python gensim doc2vec sentence-similarity

asked Nov 20 '17 at 06:28

User54211

votes

1 answer

How to obtain document vectors in doc2vec in gensim

I know to obtain a document vector for a given tag in doc2vec using print(model.docvecs['recipe__11']). My document vectors are either recipes (tags start with recipe__), newspapers (tags start with news__) or ingredients (tags start with…

python gensim doc2vec

asked Nov 15 '17 at 06:09

user8566323

votes

2 answers

How to load the pre-trained doc2vec model and use it's vectors

Does anyone know which function should I use if I want to use the pre-trained doc2vec models in this website https://github.com/jhlau/doc2vec? I know we can use the Keyvectors.load_word2vec_format()to laod the word vectors from pre-trained word2vec…

python numpy gensim doc2vec

asked Oct 17 '17 at 08:59

Vera

votes

0 answers

Doc2Vec from gensim to deeplearning4j

Is there any way to load doc2vec model saved using gensim into deeplearning4j's ParagraphVectors? My gensim model is valid - I am able to load it using gensim with no problems. When I call WordVectorSerializer.readParagraphVectors on my model from…

java python gensim deeplearning4j doc2vec

asked Oct 16 '17 at 11:53

dkaras

votes

1 answer

applying the Similar function in Gensim.Doc2Vec

I am trying to get the doc2vec function to work in python 3. I Have the following code: tekstdata = [[ index, str(row["StatementOfTargetFiguresAndPoliciesForTheUnderrepresentedGender"])] for index, row in data.iterrows()] def prep (x): low =…

python gensim doc2vec

asked Oct 04 '17 at 08:09

Niels Helsø

votes

0 answers

TypeError while using infer_vector on a gensim Doc2Vec model loaded from memory

I am a little new to doc2vec algorithm and using gensim for its implementation in python. Following the gensim tutorial "Gensim Doc2vec Tutorial on the IMDB Sentiment Dataset" I have built vocab and trained a doc2vec model, and stored it on the disc…

python gensim doc2vec

asked Sep 15 '17 at 11:09

cvipul

votes

1 answer

gensim doc2vec "intersect_word2vec_format" command

Just reading through the doc2vec commands on the gensim page. I am curious about the command"intersect_word2vec_format" . My understanding of this command is it lets me inject vector values from a pretrained word2vec model into my doc2vec model…

nlp gensim doc2vec

asked Sep 02 '17 at 11:26

pete the dude

votes

1 answer

How to train word2vec with your own vocab

I am getting error while training word2vec with my own vocabulary. I am also not getting why its happening. Code: from gensim.models import word2vec import logging logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',…

nlp stanford-nlp word2vec doc2vec

asked Aug 27 '17 at 11:32

Manish Kumar

1,419
3
17
36

votes

1 answer

Can I create a topic model (such as LDA) from the output of doc2vec model?

I did document similarity on my corpus using Doc2Vec and it outputting not that good of similarities. I was wondering if I could do a topic model from what Doc2Vec is giving me to increase the accuracy of my model in order to get better…

nlp gensim lda topic-modeling doc2vec

asked Jul 21 '17 at 17:19

Eshita Nandini

votes

1 answer

How do I find cosine similarity between two text documents using Java?

I need to compare a large number of tweets containing a particular hashtag to display the tweet which has the highest content in it. For the same, I need to find pair-wise cosine similarity between each one of them and display the tweet with highest…

java nlp tf-idf cosine-similarity doc2vec

asked Mar 28 '17 at 16:58

Manan Kalra

Prev 1 2 3

…

37 38 Next