1

Can I use multiple softmax in the last output layer in transformers? If so, how can I calculate loss from that. I am working in pytorch.

And I am asking because my data is a sequence of tuples where, the elements have different dimensions. Like,

[(2,1), (3,1), (3,1), (2,1), (2,1), (3,1), (3,0), (4,1)]

The first element of tuples has a vocab of 5 and the second element of tuples has a vocab of 2.

afsana mimi
  • 53
  • 1
  • 5

0 Answers0