If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it.
Parameters are tensors that are to be trained and will be returned by model.parameters()
. They are easy to register, all you need to do is wrap the tensor in the nn.Parameter
type and it will be automatically registered. Note that only floating point tensors can be parameters.
class ToyModule(torch.nn.Module):
def __init__(self) -> None:
super(ToyModule, self).__init__()
self.layer = torch.nn.Linear(2, 2)
# registering expected_moved_cuda_tensor as a trainable parameter
self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))
def forward(self, input: torch.Tensor) -> torch.Tensor:
return self.layer(input)
Buffers are tensors that will be registered in the module so methods like .cuda()
will affect them but they will not be returned by model.parameters()
. Buffers are not restricted to a particular data type.
class ToyModule(torch.nn.Module):
def __init__(self) -> None:
super(ToyModule, self).__init__()
self.layer = torch.nn.Linear(2, 2)
# registering expected_moved_cuda_tensor as a buffer
# Note: this creates a new member variable named expected_moved_cuda_tensor
self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))
def forward(self, input: torch.Tensor) -> torch.Tensor:
return self.layer(input)
In both of the above cases the following code behaves the same
>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)