Same sentences produces a different vector in XLNet

Question

I have computed the vectors for two same sentences using XLNet embedding-as-service. But the model produces different vector embeddings for both the two same sentences hence the cosine similarity is not 1 and the Euclidean distances also not 0. in case of BERT its works fine. for example; if

vec1 = en.encode(texts=['he is anger'],pooling='reduce_mean')
vec2 = en.encode(texts=['he is anger'],pooling='reduce_mean')

the model (XLNet) is saying that these two sentences are dissimilar.

Did you get to know why did it happen? – Yoshitha Penaganti Feb 02 '22 at 06:28 — Yoshitha Penaganti, Feb 02 '22 at 06:28

score 1 · Answer 1 · answered Aug 20 '22 at 10:41

This is because of to the dropout layers in the model. During inference, the dropout layers should be turned off but there is a bug in the library. It is discussed here and apparently still not fixed.

See the discussion here: https://github.com/amansrivastava17/embedding-as-service/issues/45

In the mean time as suggested by @Davide Fiocco, you can use the straightforward approaches from HuggingFace. Either use forward, generate or pipeline.

Davide Fiocco · Answer 2 · 2022-02-05T11:15:56.017

0

As workaround, if you have some flexibility, what about using the vanilla transformers library instead?

Results from

from transformers import pipeline
embedder = pipeline("feature-extraction", model="xlnet-base-cased")
embedder("he is anger")

are deterministic.

edited Feb 05 '22 at 11:15

answered Feb 04 '22 at 20:50

Davide Fiocco

5,350
5
35
72

Same sentences produces a different vector in XLNet

2 Answers2