1

I am trying to read a text from an image using tessnet2 in a c# application. This is my code:

string valoare="";
lblOCR.Text = "";

Bitmap image = new Bitmap(@"C:\Stamp\test.png");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.Init(@"F:\Manipulare pdf\bin(1)\Release32\tessdata", "eng", false); 
var rect = new System.Drawing.Rectangle();
List<tessnet2.Word> result = ocr.DoOCR(image, rect);

int lc = tessnet2.Tesseract.LineCount(result);
foreach (tessnet2.Word word in result)
{
lblOCR.Text += word.Text+" "+word.Confidence+"<br/>";
}

The resulted string contains only numbers, but my picture contains letters and I don't understand why.

Thank you

roroinpho21
  • 732
  • 1
  • 11
  • 27
  • could have something to do with orientation of the image? I'm not sure if they implemented the "auto" orientation for the C# wrapper, i know it's in the C++ source.... – devHead Apr 05 '13 at 13:55

2 Answers2

0

try putting again the language pack.

Tessnet2 works with language package 2 and up.

0

I know I am quite late. I found the solution somewhere else.

My ocr was configured to only see digits. But I see you don't have this line:

ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digits only

For me deleting it did the trick. Maybe you need to configure similar to this.