1

I'm trying to use Google's translate method from its Translation API as documented here, but for some reason the translations I get replace non-Latin characters with underscores.

For instance, with curl on the command-line:

$ curl -X POST 'https://translation.googleapis.com/language/translate/v2/?source=en&target=de&q=Practicing+diligently+each+day+means+inevitable+improvement.&key=MY_API_KEY'  
{
  "data": {
    "translations": [
      {
        "translatedText": "T_glich flei_ig zu _ben, bedeutet unausweichliche Verbesserung."
      }
    ]
  }
}

Compare to the English-to-German result from translate.google.com:

Täglich fleißig zu üben, bedeutet unausweichliche Verbesserung.

It's especially bad when the target is a language like Japanese, which doesn't contain Latin characters:

$ curl -X POST 'https://translation.googleapis.com/language/translate/v2/?source=en&target=ja&q=Practicing+diligently+each+day+means+inevitable+improvement.&key=MY_API_KEY' 
{
  "data": {
    "translations": [
      {
        "translatedText": "______________________________________________________"
      }
    ]
  }
}

Maybe this is a trial account limitation? Nothing I've seen in this docs would indicate this, however.

rdm
  • 11
  • 3
  • have you tried running it from an application? perhaps curl doesn't know what to do in your terminal – Daniel A. White Mar 19 '19 at 19:57
  • Thanks! You're right, I think it is a terminal issue, based on pasting the Google translation into my prompt, so that just leaves to to troubleshoot the terminal and post my solution. – rdm Mar 19 '19 at 20:06

3 Answers3

0

I believe it's a string-encoding issue.

I assume your HTTP request body is being sent using application/x-www-form-urlencoded - which does not support characters above 0x7F (128) as literal text, see here: application/x-www-form-urlencoded and charset="utf-8"?

I suggest:

  1. POST with an explicit Content-Type: application/json header with the charset=utf-8 field set. (x-www-form-urlencoded does not support the charset field).
  2. Ensure your terminal is using UTF-8
  3. Also take a look using a tool like Wireshark, or create the request in JavaScript using fetch and use Chrome's Developer Tools' Network tab's "Copy as cURL (Bash)" command to get the terminal command to use.
Dai
  • 141,631
  • 28
  • 261
  • 374
0

Somewhat embarrassingly, this was actually just an issue with tmux, the terminal multiplexer I was using to read the output of every call I made to the Translation API, both with curl and with the printed output of the code I was writing.

As per this Ask Ubuntu answer to someone else's tmux question, this is fixable by explicitly telling tmux to launch with UTF-8 support, i.e., tmux -u.

Thanks both to Dai and Daniel for pointing to a potential terminal issue.

rdm
  • 11
  • 3
0

I just tried with the following request and it worked well:

curl -X POST "https://translation.googleapis.com/language/translate/v2?key=MY_API_KEY" \
-H "Content-Type: application/json" \
--data "{
        'q': 'Practicing diligently each day means inevitable improvement.',
        'source': 'en',
        'target': 'de'
}"

Giving this output:

{
  "data": {
    "translations": [
      {
        "translatedText": "Täglich fleißig zu üben, bedeutet unausweichliche Verbesserung."
      }
    ]
  }
}

And for the Japanese output:

{
  "data": {
    "translations": [
      {
        "translatedText": "毎日熱心に練習することは避けられない改善を意味します。"
      }
    ]
  }
}

Hope it helps

F10
  • 2,843
  • 2
  • 12
  • 18