I want to train the TrOCR on my custom dataset of receipts. Since we are to use the OCR for receipts we chose the “printed” fine-tuned model. We use a dataset of 5000 bounding boxes where each contains a word. However we experience that all metrics (cer, precision) and loss worsens for each epoch we run. We can't figure out why the model performs more poorly for each epoch.
Processor, model and optimizer is shown below:
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed")
optimizer = optim.AdamW(model.parameters(), lr=5e-5)
Training method:
for epoch in range(self.epochs):
self.model.train()
train_loss = 0.0
for batch in tqdm(self.train_dataloader):
for k, v in batch.items():
batch[k] = v.to(self.device)
outputs = self.model(**batch)
loss = outputs.loss
loss.backward()
self.optimizer.step()
self.optimizer.zero_grad()
train_loss += loss.item()
Does anyone know what we might be doing wrong?
When evaluating the model out-of-the-box, it performs well but we wish to continue training on our receipts to hopefully improve the model.