-1

My goal is to screen-scrape a portion of a program which constantly updates with new text. I have tried OCR with Tesseract but I believe it would be much more efficient to somehow intercept the text if possible. I have attempted using the GetWindowText() function, but it only returns the window title. Using Window Detective I have determined that whenever the window updates in the way I wish to capture, a WM_PAINT message is reliably sent to the window.

I have looked a bit into Windows API Hooks, but it seems that most of these techniques involving DLL injection are intended at sending new messages, not accessing the content of already sent messages.

How should I approach this problem?

Ethan Kershner
  • 121
  • 3
  • 13
  • 2
    It has no content, it is merely a notification to a program that it needs to redraw its UI. Which runs a bunch of code in the program to get the job done, far out of reach from your program. So this can't go anywhere, look at WM_PRINT/CLIENT and BitBlt() to make a screenshot. – Hans Passant Jun 08 '18 at 23:48
  • 2
    You *might* be able to inject code into the process to hook `DrawText()` and similar APIs directly. See if the program is using that to render the text you are interested in. – Remy Lebeau Jun 08 '18 at 23:54
  • 1
    You seem to be confused about the anatomy of a Windows GUI application. See [Learn to Program for Windows in C++](https://msdn.microsoft.com/en-us/library/windows/desktop/ff381399.aspx) to fill in those gaps, and learn, why your envisioned solution will not work. – IInspectable Jun 09 '18 at 12:44
  • @HansPassant I think the OP wants the text that is being drawn, not the resulting pixels, hence my answer. Also, `WM_PRINT` is unreliable in my experience. Not all apps / windows / controls implement it properly. `PrintWindow()` works better. – Paul Sanders Jun 11 '18 at 13:35

1 Answers1

1

When you say 'screen-scrape', is that what you really mean? Reading your post, it sounds like you actually want to get at the text in the child window or control in question - as text, and not just as a bitmap. To do that, you will need to:

  1. Determine which child window or control actually contains the text you want to get at. It sounds like you may have already done that but if not, the tool of choice is generally Spy++. (Please note: the version of Spy that you use must match the 'bitness' of your application.)

  2. Then, firstly, try to figure out whether the text in that window can be retrieved somehow. If it's a standard Windows control (specifically EDIT or RICHEDIT) then there are documented ways to do that, see MSDN.

  3. If that doesn't pan out, you might have some success hooking calls to ExtTextOut(), although that's not a pleasant proposition and I think you might struggle to achieve it. That said, I believe the accepted way (in some sense of the word 'accepted') is here.

  4. With reference to point 3, even if you achieve it, how would you know whether any particular call to ExtTextOut() was drawing to the window you're interested in? Answer, most likely, HWND WindowFromDC().

I hope that helps a little. Please don't come back at me with a bunch of detailed questions about how this might apply to your particular use-case. I'm not really interested in that, these are just intended as a few signposts.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48