Is there pre-trained doc2vec model?

Question

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

I just wanted to add a link to other pretrained gensim models: http://nilc.icmc.usp.br/embeddings — xxx, May 09 '23 at 11:34

score 9 · Answer 1 · answered Jul 10 '18 at 03:48

I don't know of any good one. There's one linked from this project, but:

it's based on a custom fork from an older gensim, so won't load in recent code
it's not clear what parameters or data it was trained with, and the associated paper may have made uninformed choices about the effects of parameters
it doesn't appear to be the right size to include actual doc-vectors for either Wikipedia articles (4-million-plus) or article paragraphs (tens-of-millions), or a significant number of word-vectors, so it's unclear what's been discarded

While it takes a long time and significant amount of working RAM, there is a Jupyter notebook demonstrating the creation of a Doc2Vec model from Wikipedia included in gensim:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb

So, I would recommend fixing the mistakes in your attempt. (And, if you succeed in creating a model, and want to document it for others, you could upload it somewhere for others to re-use.)

I know this is a very old answer but do you think it is possible to train a Doc2Vec model on Google colab? — Dani, Jun 13 '21 at 14:29
I'm not a users of Google Colab, but if I understand correctly that it lets you run Python code, in a notebook, with enough RAM to do common ML tasks – sure why not? — gojomo, Jun 13 '21 at 18:06

score 8 · Answer 2 · answered Nov 15 '18 at 19:14

8

Yes! I could find two pre-trained doc2vec models at this link

but still could not find any pre-trained doc2vec model which is trained on tweets

answered Nov 15 '18 at 19:14

Moniba

789
10
17

Is there pre-trained doc2vec model?

2 Answers2

Linked