I am trying to train a GPT-2 model to take in a tokenized/padded input and predict the output. My batch size is 32. My max length is 343. I believe that the 768 comes from the model. I cannot get the loss function to work properly though. The training loop keeps throwing me errors like this:
RuntimeError: Expected target size [32, 768], got [32, 343]
# Create a TensorDataset from input_ids and output_ids
dataset = TensorDataset(input_tensors, output_tensors)
#Constants
batch_size = 32
num_epochs = 20
# Create a DataLoader from the dataset
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
# Set the device to run on
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Define the model architecture
model = transformers.GPT2Model.from_pretrained('gpt2').to(device)
# Define the loss function
loss_function = nn.CrossEntropyLoss(ignore_index=0, reduction='mean')
# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Set the model to training mode
model.train()
print(f"input_tensors.shape before the loop: {input_tensors.shape}")
print(f"output_tensors.shape before the loop: {output_tensors.shape}")
# Loop over the number of epochs
for epoch in range(num_epochs):
# Initialize the epoch loss
epoch_loss = 0
# Loop over the data in the dataloader
for input_tensors, output_tensors in dataloader:
# Send the input and target tensors to the device
input_tensors = input_tensors.to(device)
output_tensors = output_tensors.type(torch.LongTensor)
output_tensors = output_tensors.to(device)
# Zero gradients
optimizer.zero_grad()
# Begin Forward pass
logits = model(input_tensors)[0]
print(f"logits.shape: {logits.shape}")
print(f"input_tensors.shape: {input_tensors.shape}")
print(f"output_tensors.shape: {output_tensors.shape}")
# Compute the loss
loss = loss_function(logits, output_tensors)
# Backward pass
loss.backward()
# Update the model parameters
optimizer.step()
# Add the loss to the epoch loss
epoch_loss += loss.item()
# Print the epoch loss
print(f'Epoch {epoch+1}: Loss = {epoch_loss}')
And the sizes of the tensors:
input_tensors.shape == torch.Size([2625, 343])
before the loopoutput_tensors.shape == torch.Size([2625, 343])
before the looplogits.shape == torch.Size([32, 343, 768])
input_tensors.shape == torch.Size([32, 343])
output_tensors.shape == torch.Size([32, 343])
I have tried squeezing/unsqueezing and changing the shape of the logits/output_tensors shape. I think that's the right next step but I can't figure out what to change exactly.