3

I'm attempting to read supermarket receipts using a combination of Tesseract and Magick in R. My first attempt (See attempt 1) works quite well considering there was no pre processing. My second attempt listed below appears to perform slightly better. So to summarise I guess I have two questions.

Question 1: Is it possible to train the Tesseract package in R? Question 2: If it's possible to train the package how does one go about it? I imagined there would be some way of improving performance by manually correcting errors?

As a side note I tried the abbyr package which had slightly better results again but comes with a nice hefty price tag.

Attempt 1

text1 <- ocr("Receipt.jpg", engine = tesseract("eng"))
cat(text1)

Attempt 2

text2 <- image_read("Receipt.jpg") %>%
  image_resize("2000") %>%
  image_convert(colorspace = 'gray') %>%
  image_trim() %>%
  engine = tesseract("eng") %>%
  image_ocr()

cat(text2)
alistaire
  • 42,459
  • 4
  • 77
  • 117
Frank
  • 41
  • 3

0 Answers0