Gensim Doc2Vec infer_vector on paragraphs with unseen words generates vectors that differ based on the characters in the unsween words.
for i in range(0, 2):
print(model.infer_vector(["zz"])[0:2])
print(model.infer_vector(["zzz"])[0:2])
print(model.infer_vector(["zzzz"])[0:2])
print("\n")
[ 0.00152548 -0.00055992]
[-0.00165872 -0.00047997]
[0.00125548 0.00053445]
[ 0.00152548 -0.00055992] # same as in previous iteration
[-0.00165872 -0.00047997]
[0.00125548 0.00053445]
I am trying understand how unseen words affect initialization of the infer_vector. It looks like different characters will produce different vectors. Trying to understand why.