Ultimately there's really only one way to render text using Direct3D 9, and that's by drawing a texture containing the text using one of the IDirect3D9::Draw*
methods. This is how ID3DXFont
draws text under the hood. For text that never changes (eg. "Game Over") a static texture containing the text can be used. For dynamic text (eg. the player's name) the text can be rendered into a dynamic texture by the CPU (possibly using GDI), or individual letters can be picked out of a static texture that contains a full alphabet of characters.
This means it's not going to be easy to do whatever you're trying to do. In Direct3D 9 text is drawn the same way everything else is. If you intercept only draw calls you'll have no idea whether the game is rendering text or part of a scene. If you intercept all Direct3D 9 calls and track the current state of the device you should be able to distinguish whem the game is rendering UI elements from when it's rendering the scene, but you still won't know what text (if any) is being rendered unless you look at the texture. In general case that means you'd need to OCR (optical character recognition) the texture.
So it's probably not practical do whatever it is you want to do and have it work for all games. If you only need this to work with one particular game things get easier, since it would only need to with work the specific method (or methods) the game uses to render text.