9

nn.Module.cuda() moves all model parameters and buffers to the GPU.

But why not the model member tensor?

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

toy_module = ToyModule()
toy_module.cuda()
next(toy_module.layer.parameters()).device
>>> device(type='cuda', index=0)

for the model member tensor, the device stays unchanged.

>>> toy_module.expected_moved_cuda_tensor.device
device(type='cpu')
talonmies
  • 70,661
  • 34
  • 192
  • 269
hsh
  • 111
  • 2
  • 8

1 Answers1

11

If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it.


Parameters are tensors that are to be trained and will be returned by model.parameters(). They are easy to register, all you need to do is wrap the tensor in the nn.Parameter type and it will be automatically registered. Note that only floating point tensors can be parameters.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a trainable parameter
        self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

Buffers are tensors that will be registered in the module so methods like .cuda() will affect them but they will not be returned by model.parameters(). Buffers are not restricted to a particular data type.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a buffer
        # Note: this creates a new member variable named expected_moved_cuda_tensor
        self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

In both of the above cases the following code behaves the same

>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)
jodag
  • 19,885
  • 5
  • 47
  • 66