0

Can we improve Tesseract character recognition accuracy by regular expression. For example we tell to Tesseract that the text can have this kind of structure.

4characters2Digits[4Digits]3char4Digits2char

// Our string in the image is "abcd12[2222]aBc000AB"

// Our regular expression can be like this

String reg = "[a-zA-Z]{4}\d{2}\[\d{4}\][a-zA-Z]{3}\d{3}[a-zA-Z]{2}";

I think this kind Tesseract will do better recognition for characters.

And We also can set

tesseract.setTessVariable("tessedit_char_whitelist", "0123456789[]abc...Z");

Note: I am using Java Language. Tess4j

Thank you!

Bahramdun Adil
  • 5,907
  • 7
  • 35
  • 68

1 Answers1

0

You can try bazaar pattern, which supports a limited subset of Regex.

nguyenq
  • 8,212
  • 1
  • 16
  • 16