How to train a new text with gensim doc2vec

Question

sentences=gensim.models.doc2vec.TaggedLineDocument("raw_docs.txt")
model=gensim.models.Doc2Vec(sentences,min_count=1,iter=100)
sentence=TaggedDocument(words=[u'为了'],tags=[u'T1'])
sentences1=[sentence]
model.build_vocab(sentences1,update=True)
model.train(sentences1)
print "successful!"

I want to use a big data to train a doc2vec model. And I want to use this pretrained model to train a new text.

I only expect to train the new one with a pretrained model. How can I do that?The code above doesn't work...

What do you mean by 'train a new text'? Incremental updates of Word2Vec/Doc2Vec models in gensim (via the `build_vocab(...,update=True)` option) is best considered a experimental, advanced option. If at all possible, you should include all documents in a single initial training. (If you then need vectors for other documents, you can use the `infer_vector()` method.) — gojomo, Jan 19 '17 at 02:22
Thanks for your answer.Do you mean that i can use the infer_vector() to create a single doc vector with a pretrain model? — Jeffery, Jan 20 '17 at 03:14
Yes! The model remains frozen, but `infer_vector()` calculates (by training up one new vector) a model-compatible vector for a new text (list of tokens). Note you may want to change the arguments to `infer_vector()`, especially using a larger-than-default number of `steps`, for better results. — gojomo, Jan 20 '17 at 03:16
@gojomo how to use the infer_vector？which one is a right format for doc_words? 1: ['I','Love','You'] 2:'I Love You' — Jeffery, Mar 14 '17 at 13:21
The `doc_words` parameter should be a list of tokens – your example (1) – as described in the gensim documentation: https://radimrehurek.com/gensim/models/doc2vec.html#gensim.models.doc2vec.Doc2Vec.infer_vector — gojomo, Mar 14 '17 at 22:29

How to train a new text with gensim doc2vec

0 Answers0