0

The softmax function obtains the weights and then MatMul with V. Are the weights stored anywhere? Or how the learning process happened if the weights are not stored or used on the next round? Moreover, the linear transformation does not use the weights!

Source code: https://github.com/fawazsammani/chatbot-transformer/blob/master/models.py

1 Answers1

0

I would draw your attention to read the documentation always

So as we can see if we continue to the code implementation of nn.linear layer

we will see this line :

self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))

which is the weights that you are asking about.

Hope this answers your question!

Arij Aladel
  • 356
  • 1
  • 3
  • 10