3

I am trying to train a GPT-2 model to take in a tokenized/padded input and predict the output. My batch size is 32. My max length is 343. I believe that the 768 comes from the model. I cannot get the loss function to work properly though. The training loop keeps throwing me errors like this: RuntimeError: Expected target size [32, 768], got [32, 343]

# Create a TensorDataset from input_ids and output_ids
dataset = TensorDataset(input_tensors, output_tensors)

#Constants
batch_size = 32
num_epochs = 20
# Create a DataLoader from the dataset
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Set the device to run on
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the model architecture
model = transformers.GPT2Model.from_pretrained('gpt2').to(device)

# Define the loss function
loss_function = nn.CrossEntropyLoss(ignore_index=0, reduction='mean')

# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Set the model to training mode
model.train()
print(f"input_tensors.shape before the loop: {input_tensors.shape}")
print(f"output_tensors.shape before the loop: {output_tensors.shape}")

# Loop over the number of epochs
for epoch in range(num_epochs):
    # Initialize the epoch loss
    epoch_loss = 0
    
    # Loop over the data in the dataloader
    for input_tensors, output_tensors in dataloader:
        # Send the input and target tensors to the device
        input_tensors = input_tensors.to(device)
        output_tensors = output_tensors.type(torch.LongTensor)
        output_tensors = output_tensors.to(device)
        # Zero gradients
        optimizer.zero_grad()
        
        # Begin Forward pass
        logits = model(input_tensors)[0]
        
        print(f"logits.shape: {logits.shape}")
        print(f"input_tensors.shape: {input_tensors.shape}")
        print(f"output_tensors.shape: {output_tensors.shape}")
        
        # Compute the loss
        loss = loss_function(logits, output_tensors)

        # Backward pass
        loss.backward()

        # Update the model parameters
        optimizer.step()

        # Add the loss to the epoch loss
        epoch_loss += loss.item()
        # Print the epoch loss
    print(f'Epoch {epoch+1}: Loss = {epoch_loss}')

And the sizes of the tensors:

  • input_tensors.shape == torch.Size([2625, 343]) before the loop
  • output_tensors.shape == torch.Size([2625, 343]) before the loop
  • logits.shape == torch.Size([32, 343, 768])
  • input_tensors.shape == torch.Size([32, 343])
  • output_tensors.shape == torch.Size([32, 343])

I have tried squeezing/unsqueezing and changing the shape of the logits/output_tensors shape. I think that's the right next step but I can't figure out what to change exactly.

kmkurn
  • 611
  • 1
  • 13
C_Dog
  • 31
  • 2

0 Answers0