0

I have such lines in my file:

M  Aad                                  4                                             $
M  Aadam                                          1                                   $
F  Aadje                                1                                             $
M  Ådne                      +                 1                                      $

When I run the following code;

#!/usr/bin/python
# -*- coding: utf-8 -*-

import csv, unicodedata, urllib
from unidecode import unidecode
from textblob import TextBlob

with open('names.csv', 'rb') as f:
    reader = csv.reader(f)
    my_list = list(reader)

for a in range(len(my_list)):
        name = my_list[a][0]
        name = unicode(name,'ISO-8859-15')
        print name

I get such output on some lines:

F  <Z^>ydr<edeg>                                      1                                 $

There are many similar issues on stackoverflow for this case, but their solutions didn't fit to my problem.

How can I fix this problem?

yusuf
  • 3,591
  • 8
  • 45
  • 86

1 Answers1

2

It sounds like your input is not actually UTF-8, it seems to be ISO-8859-* (possibly ISO-8859-15 or ISO-8859-1), 0xC5 is the ISO encoding of Å (the UTF-8 encoding would be 0xC3 0xA5).

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • So, @Joachim, what should I do in this case? – yusuf Jan 14 '16 at 12:53
  • 2
    Do you know what [`unicode()`](https://docs.python.org/2/library/functions.html#unicode) does? It takes the first input and interprets it according to the second input (the charset). You pass in "utf8" as the second parameter and I tell you that your data is not actually in UTF-8 encoding. – Joachim Sauer Jan 14 '16 at 12:57
  • Thank you @Joackhim. Could you also please tell me, how can I convert all the characters in this file to english equivialances? – yusuf Jan 14 '16 at 13:00
  • I have change the title of the question now. Could you please check it and tell me what should I do? – yusuf Jan 14 '16 at 13:02
  • Sorry, but that fundamentally changes what your question is about (and I'm fairly sure it's a duplicate now). You should have posted that as a separate question. – Joachim Sauer Jan 14 '16 at 14:23