I have trouble in Python, when reading special national characters from a text file.
with open("../Data/DKsnak.txt") as f:
content = f.readlines()
str1 = content[0]
print "string:",str1
lst1 = str1.split()
print "list:",lst1
The output is a follow:
string: Udtræk fra observatør på årstal
list: ['Udtr\xc3\xa6k', 'fra', 'observat\xc3\xb8r', 'p\xc3\xa5', '\xc3\xa5rstal']
The first line is as expected, including special Danish charcters. But they don't survive being split into a string. I have tried various tricks with codecs and unicode, but can't find the magic bullit.
Please can anyone suggest how I get these words into lists, so I can work with them as such.
Best regards Martin
Running: Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2