1

I have a requirement to capture text from IE 9 (preferably the text currently shown and not the full HTML document). Previous versions of IE used TextOut and you can clearly see the text using an API monitor. Now that IE 9 uses Direct2D/DirectWrite to render the text there is no visible text. Suggestions say that you can hook into the DWrite.dll and grab the text via CreateTextLayout, but this is never called. I assume that the text must be intercepted maybe in MSHTML.dll before this rendering stage.

Any help is obtaining text would be appreciated.

Many thanks in advance.

Brent Campbell
  • 395
  • 1
  • 4
  • 12
  • You can get the IHTMLDocument interface from the IE com object, the use the get_innerText function to get the text. – mfc Mar 03 '13 at 22:05
  • Thanks, my aim is to grab the text as it's coming in OR alternatively grab ONLY the text on the screen. I assume the text is held and prepared by MSHTML.dll before it's rendered with DirectX but I can't find any information on this, and nor for hooking into MSHTML.dll to get this text. Using your suggestion, I have obtained the text using IHTMLDocument. From my tests today, am I correct in thinking there are no methods that get the sub-set of elements which are just on-screen? – Brent Campbell Mar 05 '13 at 17:17
  • I am not aware of any function in IHTMLDocument that can extract the on-screen text as a subset, but I did come across a method to extract the on-screen content as a bitmap using the IViewObject interface. Refer: http://starkravingfinkle.org/blog/2004/09/mshtml-hosting-odds-ends/ – mfc Mar 08 '13 at 20:33
  • Yes, thanks, I came across that link too, but I would then need to OCR it. I've tried several methods now with no luck in just obtaining the text on-screen. Thanks for your help. – Brent Campbell Mar 14 '13 at 17:09

0 Answers0