I am using pytesseract for license plate recognition, but what I am trying to do is improve accuracy by providing tesseract with whitelist of words, so it can only output things from the whitelist. As for now, I am using this command:
text = pytesseract.image_to_string(img, lang="eng", config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ ")
I tried adding --user-words /absolute/path/to/eng.user-words
, but it apparently changes nothing.
My eng.user-words
is just text file, where each line is one word, so it should be fine.
I also tried adding bazar config, as described here, but it also changed nothing.
I would appreciate help with this practicular problem, or any other tips regarding how can I use pytesseract or other OCR library to recognize single line of text, and provide it with the whitelist, as it would improve accuracy in my use case dramatically.