Can we use Regular Expression to pass as variables in Tesseract?

Question

Can we improve Tesseract character recognition accuracy by regular expression. For example we tell to Tesseract that the text can have this kind of structure.

4characters2Digits[4Digits]3char4Digits2char

// Our string in the image is "abcd12[2222]aBc000AB"

// Our regular expression can be like this

String reg = "[a-zA-Z]{4}\d{2}\[\d{4}\][a-zA-Z]{3}\d{3}[a-zA-Z]{2}";

I think this kind Tesseract will do better recognition for characters.

And We also can set

tesseract.setTessVariable("tessedit_char_whitelist", "0123456789[]abc...Z");

Note: I am using Java Language. Tess4j

Thank you!

nguyenq · Answer 1 · 2019-09-14T14:28:24.397

0

You can try bazaar pattern, which supports a limited subset of Regex.

edited Sep 14 '19 at 14:28

answered Dec 17 '15 at 03:21

nguyenq

8,212
1
16
16

this link is broken – jtlz2 Sep 11 '19 at 11:24

Can we use Regular Expression to pass as variables in Tesseract?

1 Answers1