5

I am working on an application that needs to be able to translate parts of sentences. The problem is that if I send the parts to a translation API like Google Translate, the translations often don't make sense in the context they occurred in. Example:

He leaves the building

If I translate leaves to any destination language I will probably get a result in the context of "leaves of a tree", which of course makes no sense in the example. So, translation needs to keep context into account. If I expand the translation sentence to He leaves I get the correct translation of He leaves. However, I lose the translation of leaves, which is the word I am looking for.

Does anyone have any idea as to how I should approach this? Keep in mind the Google Translate API is a paid API, so I would like to minimize the amount of translations I request from the API.

Yaeger
  • 253
  • 4
  • 15

3 Answers3

9

You are right to point out that translating without context is hopeless.

The Google Translate API, like the Chrome integration, is smart about HTML tags (the default parameters include format=html).

So one good option is to wrap the word or phrase in which you are interested in HTML tags.

You can try this in the console:

enter image description here

It should be easy to parse the contents of the HTML tag back out, then you can lemmatise.

Note 1:
The consumer-facing standalone Google Translate UI does not expose this option, to try it you must translate via the API console or programmatically, or translate pages with Chrome.

Note 2:
There are some nuances because the words in translations are inherently not 1:1. Sometimes the word becomes two words, and occasionally the word has a null representation in the target language.

1:2 example:
en: He <span>left</span> the building.
it: Ha <span>lasciato</span> l'edificio.
[Arguably the ha should also be included.]

1:0 example:
en: How <span>are</span> you?
ru: Как вы?
[to be is usually dropped in Russian.]
en: How are <span>you</span>?
it: Come stai?
[Pronouns are often dropped in Italian.]

2:1 example:
en: He is always <span>screwing</span> things up.
it: Sempre <span>spiegazza</span> le cose.
[English and other languages have separable verbs. The actual input here is to screw up, not to screw.]

For you this is some work but it is also in fact very useful information, and anyway it is easier for you to process lasciato to get the correct lemma lasciare.

See cloud.google.com/translate/docs/reference/rest for more parameter documentation

Adam Bittlingmayer
  • 1,169
  • 9
  • 22
  • 1
    It's been a while, but thanks for this. I, in collaboration with my supervisor, have written a simple Python implementation of this: https://github.com/mircealungu/python-translators – Yaeger May 13 '17 at 20:10
  • My pleasure, looks useful, congratulations on shipping some code. – Adam Bittlingmayer May 19 '17 at 06:24
  • 1
    Thx so much for this answer! Not even Chat GPT was able to come up with a solution for this and I've been probing it with questions for like half an hour or so – gignu Mar 12 '23 at 02:45
  • Just a matter of time before they crawl this answer. One day someone will look back at us like we look at the Akkadians writing on clay tablets. – Adam Bittlingmayer Mar 14 '23 at 21:07
1

My idea:

Send "He leaves", and to understand which part is "he" and which part is "leaves", intersect with all possible translations of "leaves" (keep locally a bilingual dictionary of all possible translations of all words in all forms)

user31264
  • 6,557
  • 3
  • 26
  • 40
  • That sounds like a very solid solution for single words, but what about phrases? Two or three words is often still not enough context for translation. Thank you for your contribution though, I think it would work great for single word translation. I will be trying this to test its effectiveness. – Yaeger Dec 15 '16 at 17:46
  • You send the whole sentence, match the translation against possible translations of each word in the sentence, and find the location of the word or phrase you want to translate. – user31264 Dec 15 '16 at 18:09
  • 1
    The weak part, imho, is where do you find or how do you create a dictionary which includes all words in all forms (including all forms of their translation). For instance, there are 4 cases in German, and some languages have even more cases. You cannot just translate dog as hund, as it can be hundes, hund, hunde, or hunden – user31264 Dec 15 '16 at 18:10
  • Interesting. I think there are still a few issues, but I will try it out. Regarding the issue you were talking about : the Glosbe API is a free word to word translation service. You give it a word and it gives back a list of possible translations. It doesn't seem to show all 4 German forms of "dog", but initially it only needs to work for Dutch to English anyways and I don't think it is as much of a problem in Dutch. Also, I recognise that a perfect solution probably does not exist, so any improvement on the current implementation is great. Thanks :) – Yaeger Dec 15 '16 at 18:51
  • Warning: this is non-trivial to implement. – Adam Bittlingmayer Dec 16 '16 at 06:56
0

I have applied the suggestions I found above, but still in my case I have some troubles. I am translating medicine active ingredients from Italian to Enlgish. The corious thing is that

  • the google search page translator translates it correclty
  • the API not

The active ingredient to be translated is "Testosterone Enantato", i have enricehd the sentence with "Il principio attivo della medicina è Testosterone Enantato". Here what i get

  1. google research web page: "The active ingredient of the medicine is Testosterone Enanthate" --> "Testosterone Enanthate" that is correct.
  2. google Translate API: "Testosterone Enantato", it leaves it in Italian

Any idea?

here the screnshot of the google web page with correct translation

enter image description here

fede72bari
  • 343
  • 5
  • 17