Specify whitelist of words for tesseract

Asked Apr 06 '23 at 08:47

Active Apr 07 '23 at 08:58

Viewed 83 times

I am using pytesseract for license plate recognition, but what I am trying to do is improve accuracy by providing tesseract with whitelist of words, so it can only output things from the whitelist. As for now, I am using this command:

text = pytesseract.image_to_string(img, lang="eng", config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ ")

I tried adding --user-words /absolute/path/to/eng.user-words, but it apparently changes nothing.

My eng.user-words is just text file, where each line is one word, so it should be fine.

I also tried adding bazar config, as described here, but it also changed nothing.

I would appreciate help with this practicular problem, or any other tips regarding how can I use pytesseract or other OCR library to recognize single line of text, and provide it with the whitelist, as it would improve accuracy in my use case dramatically.

asked Apr 06 '23 at 08:47

Dolidod Teethtard

Solves this your question [custom-dictionary-for-tesseract](https://stackoverflow.com/a/13556952/20851944) – Hermann12 Apr 09 '23 at 07:18

Specify whitelist of words for tesseract

0 Answers0