2

I am attempting to train EEG data through a transformer network. The input dimensions are 50x16684x60 (seq x batch x features) and the output is 16684x2. Right now I am simply trying to run a basic transformer, and I keep getting an error telling me

RuntimeError: the feature number of src and tgt must be equal to d_model

Why would the source and target feature number ever be equal? Is it possible to run such a dataset through a transformer?

Here is my basic model:

input_size = 60 # seq x batch x features
hidden_size = 32
num_classes = 2
learning_rate = 0.001
batch_size = 64
num_epochs = 2
sequence_length = 50
num_layers = 2
dropout = 0.5

class Transformer(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(Transformer, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.transformer = nn.Transformer(60, 2)
        self.fc = nn.Linear(hidden_size * sequence_length, num_classes)
    
    def forward(self, x, y):
        
        # Forward Propogation
        out, _ = self.transformer(x,y)
        out = out.reshape(out.shape[0], -1)
        out = self.fc(out)
        return out

model = Transformer(input_size, hidden_size, num_layers, num_classes)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    for index in tqdm(range(16684)):
        X, y = (X_train[index], Y_train[index])
        print(X.shape, y.shape)
    
        output = model(X, y)

        loss = criterion(output, y)
        
        model.zero_grad()
        loss.backward()
        
        optimizer.step()
        
        if index % 500 == 0:
            print(f"Epoch {epoch}, Batch: {index}, Loss: {loss}")

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • You messed up the arguments for initializing `nn.Transformer`. See the documentation: https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html Also, your input needs to have the same dimension as the Transformer which is in your case 60, or you need to project the input before passing it to Transformer. – Jindřich Apr 06 '21 at 08:56

1 Answers1

0

You train the model to find some features by feeding it the input sequence and desired sequence. The backprop trains the net by computing the loss as a "difference" between src and target features.

If the features sizes aren't the same - the backprop can't find the accordance to some desired feature and the model can't be trained.

Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42