1

In a directory in Windows I have 2 files, both of them with an accented character in its name: t1û.fn and t2ű.fn; The dir command in the Command Prompt shows both correctly:

S:\p>dir t*.fn
 Volume in drive S is q
 Volume Serial Number is 05A0-8823

 Directory of S:\p

2017-09-03  14:54                 4 t1û.fn
2017-09-03  14:54                 4 t2ű.fn
               2 File(s)              8 bytes
               0 Dir(s)  19,110,621,184 bytes free

Screenshot:

screenshot of dir

However, Python can't see both files:

S:\p>python -c "import os; print [(fn, os.path.isfile(fn)) for fn in os.listdir('.') if fn.endswith('.fn')]"
[('t1\xfb.fn', True), ('t2u.fn', False)]

It looks like Python 2 uses a single-byte API for filenames, thus the accented character in t1û.fn is mapped to the single byte \xfb, and the accented character in t2ű.fn is mapped to the unaccented ASCII single byte u.

How is it possible to use a multi-byte API for filenames on Windows in Python 2? I want to open both files in the console version of Python 2 on Windows.

pts
  • 80,836
  • 20
  • 110
  • 183
  • FYI, in case you make the jump to Python 3, note that `bytes` paths were deprecated on Windows because of this behavior. Support for `bytes` paths returned in 3.6+ by pretending the filesystem encoding is UTF-8. In 3.6 you can open `b't1\xc3\xbb.fn'` in addition to `"t1û.fn"`, and `os.listdir(b'.')` encodes filenames as UTF-8. Internally it transcodes to UTF-16 to use the Windows wide-character API. – Eryk Sun Sep 04 '17 at 03:55

1 Answers1

2

Use a unicode string:

f1 = open(u"t1\u00fb.fn")     # t1û.fn
f2 = open(u"t2\u0171.fn")     # t2ű.fn
selbie
  • 100,020
  • 15
  • 103
  • 173
  • Thank you, it works for `open`. Also `os.listdir(u'.')` returns the filenames with the right accented characters. – pts Sep 03 '17 at 20:42