I have trained a doc2vec model on the Wikipedia corpus using gensim and I wish to retrieve vectors from different documents.
I was wondering what text processing the WikiCorpus function did when I used it to train my model e.g. removed punctuation, made all the text lower case, removed stop words etc.
This is important as I wish to perform the same text processing on the documents I am inferring vectors from for greater consistency/accuracy with my model.