No, input_size
is not correctly defined. Here, input_size
means the number of features in a single input vector of the sequence. The input to the GRU is a sequence of vectors, each input being a 1-D tensor of length input_size
. In case of batched input, the input to GRU is a batch of sequence of vectors, so the shape should be (batch_size, sequence_length, input_size)
when batch_first=True
otherwise the expected shape is (sequence_length, batch_size, input_size)
when batch_first=False
import torch
batch_size = 128
input_size = 9 # features in the input
seq_len = 5 # seqence length - how many input vectors in one sequence
hidden_size = 20 # the no of fetures in the output of GRU
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=True)
X = torch.rand( (batch_size, seq_len, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([128, 5, 9])
Y.shape=torch.Size([128, 5, 20])
"""
Using batch_first=False
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=False)
X = torch.rand( (seq_len, batch_size, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([5, 128, 9])
Y.shape=torch.Size([5, 128, 20])
"""