0

I want to train the TrOCR on my custom dataset of receipts. Since we are to use the OCR for receipts we chose the “printed” fine-tuned model. We use a dataset of 5000 bounding boxes where each contains a word. However we experience that all metrics (cer, precision) and loss worsens for each epoch we run. We can't figure out why the model performs more poorly for each epoch.

Processor, model and optimizer is shown below:

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed")
optimizer = optim.AdamW(model.parameters(), lr=5e-5)

Training method:

for epoch in range(self.epochs):
    self.model.train()
    train_loss = 0.0
    for batch in tqdm(self.train_dataloader):
        for k, v in batch.items():
            batch[k] = v.to(self.device)
        outputs = self.model(**batch)
        loss = outputs.loss
        loss.backward()
        self.optimizer.step()
        self.optimizer.zero_grad()
        train_loss += loss.item()

Does anyone know what we might be doing wrong?

When evaluating the model out-of-the-box, it performs well but we wish to continue training on our receipts to hopefully improve the model.

Cuartero
  • 407
  • 1
  • 6
SauceyP
  • 1
  • 1

1 Answers1

0

you have chosen the already fine tuned model. "microsoft/trocr-base-printed" is already fine tuned on SROIE dataset. So there is no point of fine tuning already fine tuned model. Instead choose only pre trained model like trocr-base-stage1 or trocr-small-stage1.