14

As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols).

I followed various processes for example: Adding New Fonts to Tesseract 3 OCR Engine

and also used tools to automate the process like Serak Tesseract Trainer for Tesseract 3.02

For generating box files I used QT Box Editor

After using above tools I get eng.traineddata file. All tutorials tell me to add this eng.traineddata file to the Tesseract-OCR\tessdata folder, but doing so, it will replace the original eng.traineddata file. After doing this will I lose the default fonts that come with Tesseract 3.x ?

How can I Add new fonts? Its still not clear to me. I hope someone can help me here. Thanks.

md1hunox
  • 3,815
  • 10
  • 45
  • 67

2 Answers2

18

Should use a different name, e.g., eng1.traineddata. That way you can use the new data with the original one by specifying the language option -l eng+eng1.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • 1
    where can i specify the language option -l eng+eng1 ? – marcAntoine Apr 16 '14 at 09:04
  • 1
    This might sound too lazy but is there a way to provide a font file as input (to a website, say) and a trained `tessdata` is provided as output? – tipycalFlow May 14 '14 at 14:10
  • @tipycalFlow [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) has a TIFF/Box Generator. You can provide a font file and get a box with the correct values. With [Serak Tesseract Trainer](http://code.google.com/p/serak-tesseract-trainer/) you can do the rest. – Alexander Taubenkorb Dec 19 '14 at 09:51
0

If you have new trained data with different font, I think you don't have dictionary correction for your new font.

To add new trained data you can do this (I'm using PHP code here)

//  as you new trained data, it must be 3 letter prefix 
// what ever 3 letter you want
$languange = "eng+deu";
$settingLanguage = $tesseract -> setLanguage($language) ; 

By seeing the tesseract.php function setLanguage(), you can set the language by that function.

Fenton
  • 241,084
  • 71
  • 387
  • 401
ogy
  • 1