I've got a .txt file that I want to read with Python and it contains Polish citynames. I use this code (my script has :# - coding: utf-8 -*- in the first line):
string='PL.txt'
country=io.open(string,mode=r, encoding='utf-8')
lezer=csv.reader(country,dialect='excel-tab')
my_dict=defaultdict(list)
for record in lezer:
pc, gemeente= record[0], record[1]
my_dict[pc].append(gemeente)
return my_dict
When I use the code it starts running and then the error appears: returm codecs.charmap_encode(input,errors,encodeing_table) UnicodeEncodeError: charmap codec can't encode character u\'u0144' in position 35:charcter maps to
I've searched on the internet and I've found different answers bus not exact the one I need. It's about the character ń when I understand well. The basic codes charmap doesn't contain this character, so it can't be encoded. I used another codec utf16 but then it maps to something strange. I also tried other codes like latin-1, cp437, cp1252.
I also tried:
string='PL.txt'
country=io.open(string,mode=r, encoding='utf-8')
lezer=csv.reader(country,dialect='excel-tab')
my_dict=defaultdict(list)
for record in lezer:
pc, gemeente= record[0], record[1].encode('utf16')
my_dict[pc].append(gemeente)
return my_dict
when I look with type(record[1]) is gives str and not unicode. It's the same with other Polish carachters.