How can I traverse directories named in Japanese in Python?

Question

I'm trying to build a simple helper utility that will look through my projects and find and return the open ones to me via command line. But my calls to os.listdir return gibberish (example: '\x82\xa9\x82\xcc\x96I') whenever the folder or filename is in Japanese, and said gibberish can't be passed to the call again to get into the folder either. i.e. os.listdir('C:\Documents and Settings\\x82\xa9\x82\xcc\x96I') returns an error:

'WindowsError: [Error 3] 指定されたパスが見つかりません。'

Does anybody know how I can get around this? Thanks a lot.

指定されたパスが見つかりません。 means "Cannot find the path specified." — Michael, Jul 28 '11 at 08:52
Thank you! Actually, reading the Japanese is no problem for me, but I appreciate the help! — StormShadow, Jul 29 '11 at 07:54

score 6 · Accepted Answer · answered Jul 14 '11 at 08:56

You may need to decode the string into Unicode, then re-encode it in UTF-8 before passing it to os.listdir. It looks like your Japanese string is encoded in shift-JIS:

>>> '\x82\xa9\x82\xcc\x96I'.decode('shift-jis').encode('utf-8')
'\xe3\x81\x8b\xe3\x81\xae\xe8\x9c\x82'
>>> print '\x82\xa9\x82\xcc\x96I'.decode('shift-jis')
かの蜂

Alternatively, make use of the following feature of os.listdir to get Unicode strings out of it in the first place:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

So:

os.listdir(ur'C:\Documents and Settings')
# ---------^

score 2 · Answer 2 · answered Jul 14 '11 at 08:56

You should try to pass in the directory-name as Unicode-literal (u'your/path'). This way, the result is also Unicode (which is probably required to work with Japanese characters).

From the documentation:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

How can I traverse directories named in Japanese in Python?

2 Answers2