Training Tesseract in R - is it possible?

Asked Nov 02 '17 at 20:39

Active Nov 02 '17 at 22:02

Viewed 540 times

I'm attempting to read supermarket receipts using a combination of Tesseract and Magick in R. My first attempt (See attempt 1) works quite well considering there was no pre processing. My second attempt listed below appears to perform slightly better. So to summarise I guess I have two questions.

Question 1: Is it possible to train the Tesseract package in R? Question 2: If it's possible to train the package how does one go about it? I imagined there would be some way of improving performance by manually correcting errors?

As a side note I tried the abbyr package which had slightly better results again but comes with a nice hefty price tag.

Attempt 1

text1 <- ocr("Receipt.jpg", engine = tesseract("eng"))
cat(text1)

Attempt 2

text2 <- image_read("Receipt.jpg") %>%
  image_resize("2000") %>%
  image_convert(colorspace = 'gray') %>%
  image_trim() %>%
  engine = tesseract("eng") %>%
  image_ocr()

cat(text2)

edited Nov 02 '17 at 22:02

alistaire

42,459
4
77
117

asked Nov 02 '17 at 20:39

Frank

Training Tesseract in R - is it possible?

Attempt 1

Attempt 2

0 Answers0