0

I designed a simple code for extracting the text out of images. The images only contain numbers. This are the images, I want to convert:

First image

Second image

Third image

The result of the OCR conversion is always ~

For this two images, the OCR conversion is giving results with some mistakes:

First picture with "successful" conversion - Result: "[1479502352"

Second picture with "successful" conversion - Result: "[1479502459"

Here is my code:

    using Tes = tessnet2;

    private const string TesIni = @"C:\Program Files (x86)\Tesseract\tessdata";

    static void Main(string[] args)
    {
        System.Drawing.Bitmap TextImg = new System.Drawing.Bitmap(@"C:\HomeC\RPA_Prozesse\BOB_NPM_Retour\btnImages\TestTextImage.png");
        Tes.Tesseract ocr = new Tes.Tesseract();
        ocr.Init(TesIni, "eng", true);
        System.Collections.Generic.List<Tes.Word> Result = ocr.DoOCR(TextImg, System.Drawing.Rectangle.Empty);
        foreach(Tes.Word wrd in Result)
        {
            System.Console.WriteLine(wrd.Text);
        }

        System.Console.WriteLine("Application finished. Press any key to exit...");
        System.Console.ReadLine();
    }

In the documentation is written, that "Tesseract" is the best open source OCR-Library and was hardly improved by Google - but the results are extremly poor.

Are there some settings, I can change to get better results?

Jan021981
  • 521
  • 3
  • 28
  • Well, the docs mention "ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only" and "ocr.Init(@"c:\temp", "fra", false); // To use correct tessdata"... why not go with that first and then start fiddling? – C. Gonzalez Sep 05 '17 at 15:20
  • The text looks small. Try rescaling the image to 300 DPI. – nguyenq Sep 07 '17 at 21:24

0 Answers0