5
sentences=gensim.models.doc2vec.TaggedLineDocument("raw_docs.txt")
model=gensim.models.Doc2Vec(sentences,min_count=1,iter=100)
sentence=TaggedDocument(words=[u'为了'],tags=[u'T1'])
sentences1=[sentence]
model.build_vocab(sentences1,update=True)
model.train(sentences1)
print "successful!"

I want to use a big data to train a doc2vec model. And I want to use this pretrained model to train a new text.

I only expect to train the new one with a pretrained model. How can I do that?The code above doesn't work...

theoretisch
  • 1,718
  • 5
  • 24
  • 34
Jeffery
  • 151
  • 1
  • 1
  • 7
  • What do you mean by 'train a new text'? Incremental updates of Word2Vec/Doc2Vec models in gensim (via the `build_vocab(...,update=True)` option) is best considered a experimental, advanced option. If at all possible, you should include all documents in a single initial training. (If you then need vectors for other documents, you can use the `infer_vector()` method.) – gojomo Jan 19 '17 at 02:22
  • Thanks for your answer.Do you mean that i can use the infer_vector() to create a single doc vector with a pretrain model? – Jeffery Jan 20 '17 at 03:14
  • Yes! The model remains frozen, but `infer_vector()` calculates (by training up one new vector) a model-compatible vector for a new text (list of tokens). Note you may want to change the arguments to `infer_vector()`, especially using a larger-than-default number of `steps`, for better results. – gojomo Jan 20 '17 at 03:16
  • @gojomo how to use the infer_vector?which one is a right format for doc_words? 1: ['I','Love','You'] 2:'I Love You' – Jeffery Mar 14 '17 at 13:21
  • The `doc_words` parameter should be a list of tokens – your example (1) – as described in the gensim documentation: https://radimrehurek.com/gensim/models/doc2vec.html#gensim.models.doc2vec.Doc2Vec.infer_vector – gojomo Mar 14 '17 at 22:29

0 Answers0