IronOCR / Tesseract OCR recognize single digit

Question

I want to use IronOCR to recognize single digits from a screenshot.

The problem is, that my .Read() Result always ends up as an empty "".

This is my code

        var bmpScreenshot = new Bitmap(105,
        25,
        PixelFormat.Format32bppRgb);

        var gfxScreenshot = Graphics.FromImage(bmpScreenshot);

        gfxScreenshot.CopyFromScreen(992,
        400,
        0,
        0,
        new Size(105, 25),
        CopyPixelOperation.SourceCopy);

        var ocrInput = new IronOcr.OcrInput(bmpScreenshot);
        ocrInput.EnhanceResolution();
        ocrInput.Contrast();
        ocrInput.Invert();

        var Ocr = new IronOcr.IronTesseract();
        Ocr.Configuration.WhiteListCharacters = "0123456789";

        var Result = Ocr.Read(ocrInput).Text;

Example screenshot used for the recognition

If I want to recognize 2 or more digits it works fine (most of the time).

Any idea how to get this done?

Did you try to read the documentation https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md? — user898678, Mar 07 '22 at 12:02
It helped me with overall accuracy by using Ocr.Configuration.WhiteListCharacters = "0123456789"; Ocr.Configuration.TesseractVariables.Add("load_system_dawg", false); Ocr.Configuration.TesseractVariables.Add("load_freq_dawg", false); however, I am still struggling with detecting single digits. — John Smith, Mar 07 '22 at 14:52
In referenced docs, there are instructions on how to solve your problem. Just read it and follow. — user898678, Mar 09 '22 at 06:10

darren · Answer 1 · 2022-03-14T07:43:19.793

Disclaimer: I work for Iron Software.

If you are always expecting single characters, try setting the TesseractPageSegmentationMode to SingleChar, which treats the image as a single character.

Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleChar;

Otherwise our engineers tested your example, and saw a successful result with the following image filters.

Code example:

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Financial;
using (var Input = new OcrInput(@"F:\input.png"))
{
    Input.DeNoise();
    Input.Invert();
    //Input.DeepCleanBackgroundNoise();
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}

Small note that DeepCleanBackgroundNoise() is very CPU intensive as it performs heavy background noise removal. Only use for extreme document background noise.

I tried but not works, even with binarisation filter (black and white img). It fails. — Renê Guilherme Nucci, Jul 14 '22 at 16:42

IronOCR / Tesseract OCR recognize single digit

1 Answers1