-1

This question is related to Getting file path with umlauts from command line arguments under win7 using a batch file but has another twist. I installed win_unicode_console. I opened a console window and change the codepage via chcp.com 65001 > nul then I started a python script inside a path with umlauts which works fine in this console but the program throws this error:

Traceback (most recent call last):
File "C:\path\to\script.py",
line 205, in <module>
print err
File "C:\python27\lib\site-packages\win_unicode_console\streams.py", line 256,
in write
self.base.write(s)
File "C:\python27\lib\site-packages\win_unicode_console\streams.py", line 216,
in write
return self.base.write(s)
File "C:\python27\lib\site-packages\win_unicode_console\streams.py", line 165,
in write
raise exc
WindowsError:

The program complaints about "print err" because this is the way I try to capture the exception:

try:
    ... main code ...
except Exception, err:
    print err

Inside the ... main code ... part there is a line

print 'ausgewählte Konfiguration:'

This 'ä' character causes the error but I do not understand why. I tried everything I can to make the encoding right but the console output crashes everything.

If I do not change the codepage of the console via chcp , as eryksun suggested, the error is gone but there is a new problem. The script receives a filepath containing special characters like "ä". The script opens the file without problems in the first place but writing data back is not possible

'utf8' codec can't decode byte 0xe4 in position 4: invalid continuation byte
thopy
  • 77
  • 1
  • 8
  • You have win_unicode_console, so why use codepage 65001? The console's implementation of codepage 65001 is broken in so many ways across versions of Windows. – Eryk Sun Apr 05 '18 at 21:31
  • 1
    Because you suggested it ;-) You commented on https://stackoverflow.com/questions/49445992 that I should change the codepage of the console if I want to call a windows batch script that execute a python script that receives a command line argument with Umlauts. I will edit my question to show what happens if I do not change the codepage. – thopy Apr 06 '18 at 07:34
  • I stressed that the codepage should only be changed temporarily to 65001 in a batch script, i.e. change to 65001 to read a command line into a Unicode environment variable, and then switch back to the original codepage. It shouldn't be left at codepage 65001 when running an external program. – Eryk Sun Apr 06 '18 at 09:27
  • OK, I misunderstood that. But anyway there are no subsequent commands that I can switch off chcp for and it turned out that I do not have to turn it on anyway which is obviously the best choice. Thanks for your help. – thopy Apr 06 '18 at 12:45
  • Even if you're not saving the batch script as UTF-8, there can still be mojibake issues if you're not careful. By default CMD uses the console output codepage to decode a batch script, which defaults to OEM, but this can be changed in the registry or shortcut settings, and the user may have changed it via chcp.com or mode.com. So, no matter what, a batch script that uses non-ASCII characters has to ensure the active codepage is the same as the one it was saved with. – Eryk Sun Apr 06 '18 at 20:33
  • But the issue with `print 'ausgewählte Konfiguration:'` should be resolved by saving the script as UTF-8 with a `# -*- coding: utf-8 -*-` coding spec. Bear in mind that `'ausgewählte Konfiguration:'` is a *byte* string, and all `print` can do is write it to stdout as is. win_unicode_console then decodes it as UTF-8 in order to write a native UTF-16LE string to the console. – Eryk Sun Apr 06 '18 at 20:40

1 Answers1

0

OK, I found a way that works. Using the file path as is, caused the Unicodedecodeerror but after using

filepath.decode('mbcs')

everything works now.

Actually, I do not understand why this works. I guess the handling of the commandline argument was done by win_unicode_console. That is why the script could read the filepath containing "ä" from the commandline. It was probably already encoded with codepage 850 (?) The file path was then saved inside an object attribute. The python script tried to decode it using utf-8 but failed that is why I had to decode the windows encoding first. Then decoding with utf-8 succeeded. Maybe someone could shed a light on this in more detail.

thopy
  • 77
  • 1
  • 8
  • In Python 2, `win_unicode_console.enable(use_unicode_argv=True)` sets `sys.argv` from the native Unicode command line. Subsequently it should consist of `unicode` strings and should not be decoded. – Eryk Sun Apr 06 '18 at 09:29
  • Yes, I read that in the documentation of win_unicode_console but did not give it a try. I have added win_unicode_console.enable() to ..Lib/site-packages/usercostomize.py this seems to work now but as you pointed out earlier this works only for the german dotted characters because they are included in the windows codec if I have to use greek path names my solution would not work. – thopy Apr 06 '18 at 12:49