It's reasonable to try Doc2Vec for analyzing such user-to-document relationships.
You could potentially represent a user by the average-of-the-last-N-docs-consumed, as you suggest. Or all docs they consumed. Or perhaps M centroids chosen to minimize the distances to the last N documents they consumed. But which might do well for the data/goals could only be found by exploratory experimentation.
You could try adding user-tags to whatever other doc-ID-tags (or doc-category-tags) provided during bulk Doc2Vec training. But, beware that adding more tags means a larger model, and in some rough sense "dilutes" the meaning that can be extracted from the corpus, or allows for overfitting based on idiosyncracies of seldom-occurring tags (rather than the desired generalization that's forced when a model is smaller). So if you have lots of user-tags, and perhaps lots of user-tags that are only applied to a small subset of documents, the overall quality of the doc-vectors may suffer.
One other interesting (but expensive-to-calculate) technique in the Word2Vec space is "Word Mover's Distance" (WMD), which compares texts based a cost to shift all one text's meaning, represented by a series of piles-of-meaning at vector positions for each word, to match another's piles. (Shifting words to word-vector nearby-words is cheap; to distant words is expensive. The calculation finds the optimal set of shifts, and reports its cost, with lower costs being more-similar texts.)
It strikes me that sets-of-doc-vectors could be treated the same way, and so the bag-of-doc-vectors associated with one user need not be reduced to any single average vector, but could instead be compared, via WMD, to another bag-of-doc-vectors, or even single doc-vectors. (There's support for WMD in the wmdistance()
method of gensim's KeyedVectors
, but not directly on Doc2Vec
classes, so you'd need to do some manual object/array juggling or other code customization to adapt it.)