2

I'd like to interface an application by reading the text it displays.

I've had success in some applications when windows isn't doing any font smoothing by typing in a phrase manually, rendering it in all windows fonts, and finding a match - from there I can map each letter image to a letter by generating all letters in the font.

This won't work if any font smoothing is being done, though, either by Windows or by the application. What's the state of the art like in OCRing computer-generated text? It seems like it should be easier than breaking CAPTCHAs or OCRing scanned text. Where can I find resources about this? So far I've only found articles on CAPTCHA breaking or OCRing scanned text.

I prefer solutions easily accessible from Python, though if there's a good one in some other lang I'll do the work to interface it.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • [JOCR](http://home.megapass.co.kr/~woosjung/Product_JOCR.html) seems perfect.. i'm looking into how to be able to use it from python, but any tips in that direction would be appreciated – Claudiu Apr 27 '11 at 22:56
  • So... why can't you hook the text display API calls again? – Ignacio Vazquez-Abrams Apr 27 '11 at 22:56
  • 1
    http://stackoverflow.com/questions/3877762/get-the-word-under-the-mouse-cursor-in-windows – Ignacio Vazquez-Abrams Apr 27 '11 at 23:02
  • @Ignacio: ah doesn't seem to work on one of the apps i want (it only seems to work on windows native stuff like notepad and ie - doesn't work on python's IDLE or chrome for example) – Claudiu Apr 28 '11 at 15:18

1 Answers1

1

I'm not exactly sure what you mean, but I think just reading the text with an OCR program would work well.

Tesseract is amazingly accurate for scanned documents, so a specific font would be a breeze for it to read. Here's my Python OCR solution: Python OCR Module in Linux?.

But you could generate each character as an image and find the locations on the image. It (might) work, but I have no idea how accurate it would be with smoothing.

Community
  • 1
  • 1
Blender
  • 289,723
  • 53
  • 439
  • 496
  • 1
    i believe i tried tesseract a while ago and it just didn't do well on generated text. i had to blow it up 4x to even try to get a reading and it was pretty slow (did i mention this has to be fast?) i'll take another look though, it might have been something else – Claudiu Apr 27 '11 at 22:59
  • Make sure to get the svn version. The stable sucks at OCR, oddly. – Blender Apr 27 '11 at 23:00
  • @Blender: hmm colleague says he tried both tesseract + ocropus and they seem quite bad at it – Claudiu Apr 27 '11 at 23:18
  • I haven't figured out OCROpus. But Tesseract worked with over 95% accuracy on a Russian scanned book. As I said before, this only works well with the `svn` version of Tesseract (which works better than anything I've seen to date). I compiled it a few days ago, so maybe that could make a difference. – Blender Apr 27 '11 at 23:21
  • @Blender: hmm i'll try it out. for some reason i feel it won't work as well with perfectly aligned text but maybe im just crazy – Claudiu Apr 28 '11 at 03:00
  • Mind posting a screenshot? I have it handy, so I can give you some results. – Blender Apr 28 '11 at 03:38
  • 1
    I tried it, and it's not a good idea. Guess I overestimated Tesseract. It's only good when the letters are clearly visible... – Blender Apr 28 '11 at 03:42
  • @Blender: ah thanks for trying. what do you mean clearly visible? for example what if you tried it on SO comments? the letters seem pretty well-separated – Claudiu Apr 28 '11 at 14:11
  • The letters are literally a pixel or two thick. When I do OCR, I use 19MP camera image, where each letter is about 50 pixels thick. I agree with Ignacio: use the API. – Blender Apr 28 '11 at 14:27
  • @Blender: ah ok that reflects my experience with it, then. i'll try other approaches – Claudiu Apr 28 '11 at 14:33