1

Motivation: I'm trying to write scripts which send keystrokes to the currently focused window. Right now I use xdotool, which lets me send raw keystrokes. However, I want the exact keystrokes to be a function of the current text around the input caret in the focused window.

Problem: Is there a generic way of reading the state of the text input caret -- both its current position as well as the text around it? Intuitively, I want the content of the current "text box" as well as the location of the cursor within that text box. Perhaps this is not possible in the general case, but is there a way of doing it which would work for emacs and firefox? I'm running Ubuntu Linux

Further motivation: due to a bad case of RSI I control my computer by voice rather than typing. This works by setting up voice-activated scripts that are triggered by saying different phrases. When dictating English prose, it would be helpful to automatically capitalize words at the beginning of sentences. This automatic capitalization can be accomplished by reading the characters immediately before the input caret, checking if they contain a period, and if so, capitalizing the start of the next phrase that I dictate by voice.

Thanks so much! If anybody can help me here, it would greatly increase my day-to-day accessibility.

1 Answers1

1

Since there is no standard widget toolkit for X11, but only a buch of independently developed arbitrary toolkits, there is no generic way to implement this.

As far as X11 and tools operating on its level (like xdotool) is concerned, there's only windows of either the InputOutput variety (i.e. visible windows, that receive events and one can draw to) or the just Input which are invisible and only receive events. There are no further refined "widgets" so to speak. You get a pixel grid, which you can draw to.

Accessibility interfaces are the burden of the toolkits (or if you don't use a toolkit – then you're a badass – you, the developer), to implement: https://www.freedesktop.org/wiki/Accessibility/

The absolute generic way would be to take a screenshot of the currently focused window, employ a computer vision / machine learning based solution to identify the caret, then OCR the line of text around it. And to be honest, IMHO doing it that way would probably be a lot more reliable than hoping for the accessibility interfaces to be properly implemented.

datenwolf
  • 159,371
  • 13
  • 185
  • 298