0

I have problem with Unicode strings in frozen app. I use python 3.4.1 32bit (on Windows 7 64bit Pro) and py2exe-3 from svn repository. I can demonstrate it with following code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# file: test_py2exe.py

import sys

my_string = u"""This is a test:
ábc
End of test..."""

filename = 'test.txt'
if getattr(sys, 'frozen', False):
    filename = 'test-frozen.txt'

f = open(filename,  mode='w', encoding='utf-8')
f.write(my_string)
f.close()

If I run in standard python shell (py test_py2exe.py) the second line in test.txt is like this (correct):

ábc

If I create frozen app with

py -3.4 -m py2exe.build_exe test_py2exe.py
and run 'dist\test_py2exe.exe' I have in test-frozen.txt second line like this:

ábc

This problem is not related to storing strings to file only, but also when I use other modules (e.g. PyQt5, xlsxwriter) with unicode strings. Following instruction on EvenMoreEncodings does not help... Is there any solution for this?

user898678
  • 2,994
  • 2
  • 18
  • 17
  • If it helps a clue is the corrupted string is UTF-8 mis-decoded as `windows-1250` (Central and Eastern Europe). What language is your Win7 system configured in? Also, how are you viewing the text file? It may help to encode with the so-called UTF-8 BOM by using `encoding='utf-8-sig'` instead. Windows editors tend to assume the current locale instead of UTF-8 unless the BOM is present. – Mark Tolonen Aug 01 '14 at 16:11
  • I know problem is corrupted UTF-8. The issue is not it text editor where I check file (btw it is notepad++, but the same is in internal viewer of total commander). I recognized problem pyQT5 - in statusbar or QTextedit. If I run my app in python - everything was correct. But when I froze my app with py2exe-3 UTF-8 strings are mis-decoded... – user898678 Aug 02 '14 at 18:14

1 Answers1

0

I found workaround that produce the same result in script and frozen app: my_string.encode('cp1250').decode('utf-8'). So updated code will look like this:


    #!/usr/bin/python3
    # -*- coding: utf-8 -*-

    import sys
    import locale

    my_string = u"""This is a test:
    ábc
    End of test..."""
    filename = 'test.txt'
    sys_enc = sys.getdefaultencoding()  # 'utf-8'
    locale_enc = locale.getpreferredencoding()  # 'cp1250'

    if getattr(sys, 'frozen', False):
        filename = 'test-frozen.txt'
        my_string = my_string.encode(locale_enc).decode(sys_enc)

    f = open(filename,  mode='w', encoding=sys_enc)
    f.write(my_string)
    f.close()

user898678
  • 2,994
  • 2
  • 18
  • 17