5

I want to perform a sentence similarity task and tried the following:

from transformers import XLNetTokenizer, XLNetModel
import torch
import scipy
import torch.nn as nn
import torch.nn.functional as F

tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
model = XLNetModel.from_pretrained('xlnet-large-cased')

input_ids = torch.tensor(tokenizer.encode("Hello, my animal is cute", add_special_tokens=False)).unsqueeze(0)
outputs = model(input_ids)
last_hidden_states = outputs[0]

input_ids = torch.tensor(tokenizer.encode("I like your cat", add_special_tokens=False)).unsqueeze(0) 

outputs1 = model(input_ids)
last_hidden_states1 = outputs1[0]

cos = nn.CosineSimilarity(dim=1, eps=1e-6)
output = cos(last_hidden_states, last_hidden_states1)

However, I get the following error:

RuntimeError: The size of tensor a (7) must match the size of tensor b (4) at non-singleton dimension 1

Can anybody tell me, what I am doing wrong? Is there a better way to do it?

spadel
  • 998
  • 2
  • 16
  • 40

1 Answers1

0

There are several things you are doing wrong.

  1. add_special_tokens should be set to True. The model was trained with <sep> token for separating sentences and <cls> token for sentence classification. Not using the leads to weird behavior because of the train-test data mismatch.

  2. outputs[0] gives you the first element of a single-member Python tuple. All models from the Transformer package return tuples, therefore this single-member tuple. It contains one vector per input token, including the special ones.

  3. Unlike BERT whose [CLS] token is the first one, here the <cls> token is the very last one (see Transformers documentation). If you want to compare the classification vectors, you should take last vector from the sequence, i.e. outputs[0][:, -1].

Alternatively, you might want to compare the average (mean-pool) of the embedding rather than <cls> token embedding. In that case, you can just do output[0].mean(1).

Jindřich
  • 10,270
  • 2
  • 23
  • 44
  • Thank you for your reply. I tried what you suggested and I indeed don't get an error anymore. However, the performance of the network is really bad. When computing the cosine similarity for the sentences: "My brother plays guitar." "The sun is shining." I get a result of 0.93. My code looks like that: cos = nn.CosineSimilarity(dim=1, eps=1e-6) output = cos(last_hidden_states, last_hidden_states1) print(output) – spadel Mar 23 '20 at 11:01