Theoretically, one could use a laptop's or tablet's or phone's microphone to capture spoken words, convert that to words on the screen and then, by accessing an API such as google translate, see "a" (not "the" - hardly ever, anyway) rough "draft" of a translation of those words (say, from English to Spanish or from Spanish to English).
I was thinking this would be useful in a courtroom - as a sort of "hands-free memo pad" for court interpreters.
Theoretically simple, but is it feasible? I see several potential problems:
The software would have to be told which is the target language and which is the source language. Otherwise, there might be a delay and sometimes it would even draw a wrong conclusion, if the device was left to its own devices (auto-detect).
Background noises and voices would have to be filtered out.
The translation (attempt) would only be valid once the speaker had finished their sentence - and how would the software know that? By length of the pauses? Some people pause within a sentence for a long time; some people barely pause between sentences, so...how would that work?
People not speaking clearly, or in hard-to-understand accents.
And this is not even mentioning (except here, obliquely) that context is often misconstrued by the robot underlord translators.
My intuition is that if Abraham Lincoln and Martin Luther King were speaking at the same time (which, even in a courtroom, does happen at times), the software would come up with something like this:
For score and seven years ago I am happy to join with you to day. Our fathers brought fourth on this continent, a new nation, in what will go down in history as the greatest conceived in Liberty, and. Dedicated to the perspiration that demonstration for freedom in all men are created equal. The history of our nation.
...and then be translated something like so:
Por puntuación y hace siete años que estoy encantado de unirme a ustedes hoy. Nuestros padres trajeron cuarto en este continente, una nueva nación, en lo que va a pasar a la historia como el mayor concebida en la libertad, y. Dedicada a la transpiración que la demostración por la libertad en todos los hombres son creados iguales. La historia de nuestra nación.
What I'm saying, I guess, is that humans "rock" when it comes to this sort of thing - at least compared to machines (software) in their current degree of sophistication, but do we, or will we, "rock" enough to overcome this problem? Is there a way to surmount these hurdles, at least to a sufficient extent for such a program to be worth the trouble to use? Perfection would be unattainable; matching human skill would also be, I believe, an unreachable goal, especially because of the context factor. Nevertheless: can Speech-to-Text-to-Context-to-Translation be done even relatively well and, if so, how?