Reading CSV file with special characters from Europe and Latin America in Pandas

Question

I have a file with special characters I believe from Europe and Latin America. I did pd.read_csv("file.csv", encoding='iso8859') It read some special characters. But characters like "Ãœs" still was "Ãs", "Ã„rz" was "Ãrz".There is a bunch of those. Any idea what to use for encoding? I used iso8859, iso8859-1,iso8859-15. Latin-1, UTF8,UTF16.

score 0 · Answer 1 · answered Aug 30 '23 at 20:05

You can try different Encodings in Pandas:

encodings_to_try = ['utf-8', 'latin1', 'iso-8859-1', 'iso-8859-15', 'cp1252']

for enc in encodings_to_try:
    try:
        df = pd.read_csv('file.csv', encoding=enc)
        print(f'Successfully read with encoding: {enc}')
        break
    except:
        print(f'Failed with encoding: {enc}')

score 0 · Answer 2 · answered Aug 30 '23 at 20:06

It would first be helpful to check out background on file encodings in general; this is a great resource as well as the official Python docs. It's first helpful to know exactly what your file is encoded as rather than trying to read it in different ways.

Pandas also has an encoding explanation in the docs for read_csv -- of note is that the default is utf-8. If you've exhausted those, sometimes its simpler to apply a function from a 'fixer' library such as this after reading in your data with a standard encoding such as the default.

Reading CSV file with special characters from Europe and Latin America in Pandas

2 Answers2