5

I'm trying to build a simple helper utility that will look through my projects and find and return the open ones to me via command line. But my calls to os.listdir return gibberish (example: '\x82\xa9\x82\xcc\x96I') whenever the folder or filename is in Japanese, and said gibberish can't be passed to the call again to get into the folder either. i.e. os.listdir('C:\Documents and Settings\\x82\xa9\x82\xcc\x96I') returns an error:

'WindowsError: [Error 3] 指定されたパスが見つかりません。'

Does anybody know how I can get around this? Thanks a lot.

Makoto
  • 104,088
  • 27
  • 192
  • 230
StormShadow
  • 1,589
  • 4
  • 25
  • 33

2 Answers2

6

You may need to decode the string into Unicode, then re-encode it in UTF-8 before passing it to os.listdir. It looks like your Japanese string is encoded in shift-JIS:

>>> '\x82\xa9\x82\xcc\x96I'.decode('shift-jis').encode('utf-8')
'\xe3\x81\x8b\xe3\x81\xae\xe8\x9c\x82'
>>> print '\x82\xa9\x82\xcc\x96I'.decode('shift-jis')
かの蜂

Alternatively, make use of the following feature of os.listdir to get Unicode strings out of it in the first place:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

So:

os.listdir(ur'C:\Documents and Settings')
# ---------^
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
2

You should try to pass in the directory-name as Unicode-literal (u'your/path'). This way, the result is also Unicode (which is probably required to work with Japanese characters).

From the documentation:

On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects. Undecodable filenames will still be returned as string objects.

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283