os.listdir return strange string of filename with special characters

Question

Suppose I have the following files in path, which is in my Google drive that is connected to a Python 3 Colab notebook:

(Here, the # line represents the output)

ls = os.listdir(path)
print (ls)
# ['á.csv', 'b.csv']

Every seems ok, but if I write

'á.csv' in ls
# False

But should returns True. However, if I repeat the last code, but instead of writing 'á.csv' I copy-paste it manually from print (ls), it returns True.

Thanks

ps: The problem is not exactly with that filename, is with several filenames which contains special characters (namely í, á, é, ó, ñ)

score 2 · Accepted Answer · answered Feb 01 '20 at 10:30

2

You can normalize the file list before comparing them.

from unicodedata import normalize
ls = [normalize('NFC', f) for f in os.listdir(path)]
# compare
normalize('NFC', 'á.csv') in ls
# or just 'á.csv' in ls

answered Feb 01 '20 at 10:30

korakot

37,818
16
123
144

This works. the last normalize is unnecessary although prefered for security reasons. – felipekare Feb 01 '20 at 13:53

score 1 · Answer 2 · answered Feb 01 '20 at 01:00

1

I believe it is because some diacritic characters in Unicode have duplicates. That is, while some characters appear identical, they may be different characters with different codes. Try 'á'.encode() once by writing á and once again by copy-pasting as you did. If the bytes look different, that's because they are different characters that look identical.

answered Feb 01 '20 at 01:00

Hurried-Helpful

1,850
5
15

you're right! written ```'í'.encode()``` returns ```\xc3\xad``` and copied returns ```i\xcc\x81``` Now How could I fix it? Both are in utf-8, however one codes the í and the other for the accent mark. Also in my offline Python it replicate the problem (I don't know why before it doesn't but I will edit my post) so maybe it is something of ```listdir``` – felipekare Feb 01 '20 at 01:13
You can get rid of all accents from the file names, so long as there are no collisions. Look [here](https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) to find out how. – Hurried-Helpful Feb 01 '20 at 03:05

os.listdir return strange string of filename with special characters

2 Answers2