8

I try to open a text file in French with Python 2.7. I used the command

f=open('textfr','r')

but when I use

f.read()

I lose accented characters: I get u"J'\xc3\xa9tais \xc3\xa0 Paris instead of J'étais à Paris, etc..

when in linux terminal, I do

file -i textfr 

I get

charset=utf-8

so I do not understand...

Mostafa
  • 1,501
  • 3
  • 21
  • 37

3 Answers3

11

You need to specify the charset.

f = io.open('textfr', 'r', encoding='utf-8')
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
3

By default, files are read/written using the system default text encoding, as can be found in sys.getdefaultencoding() . On most machines, this is set to utf-8 . but some of machines like yours doesn't use utf-8 you can use a proper encoding for your file , or use utf-8 that is a universal encoding :

in python 3 :

with open('somefile.txt', 'rt', encoding='utf-8') as f:
         #do stuff

in python 2 you can use codecs.open():

import codecs
f=codecs.open ('somefile.txt', 'rt', encoding='utf-8').read()
Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

use codecs instead of standard open so

import codecs
codecs.open('textfr','r', 'utf-8')  
sax
  • 3,708
  • 19
  • 22