0

I need to OCR a specific region of a scanned document and I am using MODI (Microsoft's Document Imaging COM object).

My code currently OCR's the entire page (quite accurately!), but I would like to target a specific region of the page where the text is always static (order number). How can I do this?

Here is my code for the page:

MODI.Document md = new MODI.Document();

md.Create("c:\\temp\\mpk.tiff");

md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image image = (MODI.Image)md.Images[0];

FileStream createFile = new FileStream("c:\\temp\\mpk.txt", FileMode.CreateNew);

StreamWriter writeFile = new StreamWriter(createFile);
writeFile.Write(image.Layout.Text);
writeFile.Close();

md.Close();

Can I somehow specify the region of the image?

Any help would be greatly appreciated!

Mark Kadlec
  • 8,036
  • 17
  • 60
  • 96

1 Answers1

2

There's no way to crop the image that I see with the MODI object model. The alternative is to provide it with an image that contains just the order number you want to convert. You can use the classes in the System.Drawing namespace to create it from the original. Check this MSDN page for sample code.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Hans, that's a great idea. Seems like I should be able to crop the image in memory though and somehow pass to the MODI.document (rather than saving/opening a file). Do you know if I can somehow assign the cropped image in memory to the MODI.Image? – Mark Kadlec Mar 05 '11 at 07:31
  • Yeah, you can tinker with the Images property. I don't really understand the Images.Add() method, good luck with it. It isn't actually faster on Windows, the file system cache makes the difference between memory and disk disappear. – Hans Passant Mar 05 '11 at 07:55
  • Thanks Hans, speed might not be too much of an issue so that might be okay anyways. Not sure if you know this, but do you know if I can OCR non-english words? (Ie. Order numbers like ABC123) – Mark Kadlec Mar 05 '11 at 20:50
  • No, sounds like lots of trouble. You're bound to get ABCI23 or ABCl23 (I or L, not one). I can't even make it looks distinct in this comment. Start another question about it. – Hans Passant Mar 05 '11 at 20:53