During model bulk training, the candidate doc-vector is gradually nudged to be better at predicting the text's words, just like word-vector training. So at the end of training, you have doc-vectors for all the identifiers you provided alongside the texts.
You can access these from a gensim
Doc2Vec
model via doct-style indexed lookup of the identifier (called 'doctag' in gensim') you provided during training:
model.docvecs[tag]`
Post-training, to get the doc-vector for a new text, an inference process is used. The model is held frozen, and a new random candidate vector (just like those that started bulk training for training texts) is formed for the text. Then it's incrementally nudged, in a manner fully analogous to training, to be better at predicting the words – but only this one new candidate vector is changed. (All model internal weights stay the same.)
You can calculate such new vectors via the infer_vector()
method, which takes a list-of-word-tokens that should have been preprocessed just like the texts provided during training: model.infer_vector(words)
.