3

I know related questions have been asked but my case is a bit specific because I run my code in a Docker container, and I haven't been able to make other solutions work.

I'm using python 2.7 to translate an english text to chinese (and other non-latin languages), using the translate module:

from translate import Translator
text = 'Hello'
translator= Translator(to_lang='zh')
translated_text=translator.translate(text)
print(translated_text.encode('utf-8'))

This last command fails to display the chinese text in the console, it just displays question marks. From the doc, translate() is supposed to output a unicode string.

I'm running this in an Ubuntu 16.04 Docker container and Windows as host. So maybe the problem comes with Ubuntu or Windows not having the right configuration to display these characters but I don't know how to check that. Any help will be much appreciated.

Sulli
  • 763
  • 1
  • 11
  • 33
  • https://github.com/terryyin/translate-python#use-as-a-python-module no encoding at all – py_dude Dec 05 '18 at 11:10
  • @py_dude but it specifically says "The result is in translation, and it’s usually a unicode string." in your link. Anyway, if I do print(translated_text) directly I get UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) – Sulli Dec 05 '18 at 11:14
  • Which is the default encoding on the Windows' side? The error message (range(128)) suggests that it is a 7-bit encoding, maybe CP-1252? Should that be the problem, see [_"Setting UTF8 as default Character Encoding in Windows 7"_](https://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7). – gboffi Dec 05 '18 at 11:27
  • possible problem is with your ubuntu locale, woriking fine on ubuntu & mac. – matesio Dec 05 '18 at 12:06

1 Answers1

2

I was able to display Chinese characters on windows console using:

from translate import Translator
text = 'Hello'
translator= Translator(to_lang='zh')
translated_text=translator.translate(text)
print(translated_text) # read notes
# 您好

Notes:
Before running the script, make sure you set the correct Default code page of windows console to “936 (ANSI/OEM – Simplified Chinese GBK)”. You can do this by typing chcp 936 on the console, i.e.:

chcp 936
python myscript.py
您好

Source: https://www.walkernews.net/2013/05/19/how-to-get-windows-command-prompt-displays-chinese-characters/

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268