-1

I have developed an ANPR application which requires an OCR engine. I am trying to use tesseract as the OCR engine. But I really cant find a proper step by step tutorial or guidlines as to how to include tessnet2 it in my C#.Net Project. I have already trained tesseract v3.01. can someone help with this issue please?

Thanks

Mr.Noob
  • 1,005
  • 3
  • 24
  • 58
  • Check out the source code to Subtitle Edit. It's a C# application that utilizes Tesseract for OCR'ing bitmap subtitles. http://www.nikse.dk/SubtitleEdit/ – Kevin Mark Jul 27 '12 at 08:25
  • it looks a bit complicated and i dont see what DLLs this project has used? – Mr.Noob Jul 27 '12 at 08:40

1 Answers1

1

You can't use 3.01 data with Tesseract 2.04 engine -- they're not compatible as Tesseract Wiki states. You would need Tesseract 3.0x engine. There is a .NET wrapper for 3.01: tesseract-ocr-dotnet.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • Hi thanks for your reply I just figured out that link u sent me had a dll file that i could include in my project but i still dont get what files to be included after training tessreact for my own font type. should I use all the files generated in the tessdata folder after training with tesseract 3.01v? the other part i dont get is why you voted down my question lol – Mr.Noob Jul 28 '12 at 16:45
  • Btw i tried a few things with the files seems like i got the correct files int he tessdata folder. but it throws this runtime exeption "Attempted to read or write protected memory. This is often an indication that other memory is corrupt." my system is 64bit can this be an issue? and its been thrown at the Init().. – Mr.Noob Jul 28 '12 at 17:15
  • It could be your image or project settings. Take a look at VietOCR.NET (http://vietocr.sf.net) application for a working example employing the mentioned DLL. – nguyenq Jul 28 '12 at 17:51
  • this looks to be a similar issue that some had before i guess! https://github.com/charlesw/tesseract-ocr-dotnet/issues/6 – Mr.Noob Jul 28 '12 at 17:53
  • any idea as to how the project settings should be for this to work? – Mr.Noob Jul 29 '12 at 07:45
  • I found this example down by charles https://github.com/charlesw/tesseract-ocr-dotnet/tags and it worked fine. and I found out that it's something to do with my tessdata folder. still stuck with it though! – Mr.Noob Jul 29 '12 at 12:16
  • I have trained tesseract 3.0 for a different font and how can i fix this issue? I've been struggling with this for days and any help would be appreciated! – Mr.Noob Jul 29 '12 at 18:47
  • Have you looked at VietOCR.NET source? The project is set targeting x86 .NET 2.0. – nguyenq Jul 30 '12 at 01:16
  • yeah i did. it's the error im getting is something to do with my tessdata folder. even the example I got from github worked. but when i i set the tessdata folder path to my own trained tessdata folder it throws the following error "Attempted to read or write protected memory. This is often an indication that other memory is corrupt." – Mr.Noob Jul 30 '12 at 06:23
  • I am even more confused now because vietocr.net has only 2 files in the tessdata folder which are 2 trained data files where as charles's example i downloaded off github had 8 files in the tessdata folder! – Mr.Noob Jul 30 '12 at 15:39