Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu

Question

here is my code for lstm network, I instantiated it and passed to Cuda device but still getting the error that hidden and inputs are not in same device

class LSTM_net(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
    super(LSTM_net, self).__init__()
    self.hidden_size = hidden_size
    self.lstm_cell = nn.LSTM(input_size, hidden_size)
    self.h2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, input, hidden_0=None, hidden_1=None, hidden_2=None):
    input=resnet(input)
    input=input.unsqueeze(0)
    out_0, hidden_0 = self.lstm_cell(input, hidden_0)
    out_1, hidden_1 = self.lstm_cell(out_0+input, hidden_1)
    out_2, hidden_2 = self.lstm_cell(out_1+input, hidden_2)
    output = self.h2o(hidden_2[0].view(-1, self.hidden_size))
    output = self.softmax(output)
    return output,hidden_0,hidden_1, hidden_2 

def init_hidden(self, batch_size = 1):
    return (torch.zeros(1, batch_size, self.hidden_size), torch.zeros(1, batch_size, self.hidden_size))

net1=LSTM_net(input_size=1000,hidden_size=1000, output_size=100)

net1=net1.to(device)

pic of connections that I want to make, plz guide me to implement it

click here for an image of error massege

What's the code you use to apply the model to data/train it? And where is your "resnet" model defined? — Marius, Aug 17 '20 at 08:09
I am using pre-trained ResNet(instantiated and passed to Cuda) model and input is normalized image tensor — ashwin, Aug 17 '20 at 08:39
Since your forward(...) method has the arguments "hidden_0, hidden_1, hidden_2", do you use them? If so, I'd assume that you're providing a tensor for hidden_0 which doesn't reside on the GPU yet. — Marius, Aug 17 '20 at 08:43

Marius · Answer 1 · 2020-08-17T08:55:55.043

Make sure the hidden_0 you provide for the forward() method resides in GPU memory, or ideally store it as a parameter tensor in your model so that it will be updated by the optimizer and moved to gpu by model.cuda().

Example for the second solution with hidden_0 residing in the model (added in init and used as self.hidden_0 in forward()):

class LSTM_net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTM_net, self).__init__()
        self.hidden_size = hidden_size
        self.lstm_cell = nn.LSTM(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)
        self.hidden_0 = torch.nn.parameter.Parameter(torch.zeros(1, batch_size, self.hidden_size)) #taken from init_hidden, assuming that's the intended shape

    def forward(self, input, hidden_0=None, hidden_1=None, hidden_2=None):
        input=resnet(input)
        input=input.unsqueeze(0)
        out_0, hidden_0 = self.lstm_cell(input, self.hidden_0)
        out_1, hidden_1 = self.lstm_cell(out_0+input, hidden_1)
        out_2, hidden_2 = self.lstm_cell(out_1+input, hidden_2)
        output = self.h2o(hidden_2[0].view(-1, self.hidden_size))
        output = self.softmax(output)
        return output,hidden_0,hidden_1, hidden_2

https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html#torch.nn.parameter.Parameter (answer edited) - you'll probably need to do the same for hidden_1 and hidden_2 though. — Marius, Aug 17 '20 at 10:09
thank you sir problem has been solved, could you plz help me to check whether the implemented network is correct or not according to the uploaded image of residual connections — ashwin, Aug 17 '20 at 10:18

David · Accepted Answer · 2020-08-18T14:08:16.383

-1

Edit: I think I see the problem now. Try changing

    def init_hidden(self, batch_size = 1):
        return (torch.zeros(1, batch_size, self.hidden_size), torch.zeros(1, batch_size, self.hidden_size))

to

    def init_hidden(self, batch_size = 1):
        return (torch.zeros(1, batch_size, self.hidden_size).cuda(), torch.zeros(1, batch_size, self.hidden_size).cuda())

This is because each of the tensors created by init_hidden method are not data attributes in the parent object of the function. So they do not have cuda() applied to them when you apply cuda() to an instance of the model object.

Try calling .cuda() on all the tensors/variables and models involved.

net1.cuda() # net1.to(device) for device == cuda:0 works fine also 
            # cuda() is more succinct, though
input.cuda()

# now, calling net1 on a tensor named input should not produce the error.
out = net1(input)

edited Aug 18 '20 at 14:08

answered Aug 17 '20 at 08:11

David

331
2
6

What are you inputs to the forward pass of the model? Are there any tensors getting instantiated without the cuda() call? – David Aug 17 '20 at 08:45
1

inputs are normalized image tensors, and hidden inputs are already instantiated with Cuda using net.cuda() – ashwin Aug 17 '20 at 09:12
That isn't quite much more. Just make sure you are calling .cuda() on every tensor and model involved in the computation. – David Aug 17 '20 at 09:13
1

thanks, David sir , You are very helpful, problem solved, yeah. – ashwin Aug 17 '20 at 10:11
Fantastic, glad to hear. – David Aug 17 '20 at 10:12
1

could you plz help me to check whether the implemented network is correct or not according to the uploaded image of residual connections – ashwin Aug 17 '20 at 10:16
I would be happy to, I think the image is slightly cut off, though. I think you do have it correct, just if you want to implement the full P-RRNNs mentioned in this work: https://www.researchgate.net/publication/338166170_Learning_long-term_temporal_features_with_deep_neural_networks_for_human_action_recognition, you would need to do some more work on passing the hidden states around, and ensure that your inputs are being sliced correctly. – David Aug 17 '20 at 10:21

Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu

2 Answers2