I'm working with yaml files that have to be human readable and editable but that will also be edited from Python code. I'm using Python 2.7.3
The file needs to handle accents ( mostly to handle text in French ).
Here is a sample of my issue:
import codecs
import yaml
file = r'toto.txt'
f = codecs.open(file,"w",encoding="utf-8")
text = u'héhéhé, hûhûhû'
textDict = {"data": text}
f.write( 'write unicode : ' + text + '\n' )
f.write( 'write dict : ' + unicode(textDict) + '\n' )
f.write( 'yaml dump unicode : ' + yaml.dump(text))
f.write( 'yaml dump dict : ' + yaml.dump(textDict))
f.write( 'yaml safe unicode : ' + yaml.safe_dump(text))
f.write( 'yaml safe dict : ' + yaml.safe_dump(textDict))
f.close()
The written file contains:
write unicode : héhéhé, hûhûhû
write dict : {'data': u'h\xe9h\xe9h\xe9, h\xfbh\xfbh\xfb\n'}
yaml dump unicode : "h\xE9h\xE9h\xE9, h\xFBh\xFBh\xFB"
yaml dump dict : {data: "h\xE9h\xE9h\xE9, h\xFBh\xFBh\xFB"}
yaml safe unicode : "h\xE9h\xE9h\xE9, h\xFBh\xFBh\xFB"
yaml safe dict : {data: "h\xE9h\xE9h\xE9, h\xFBh\xFBh\xFB"}
The yaml dump works perfectly for loading with yaml, but it is not human readable.
As you can see in the exemple code, the result is the same when I try to write a unicode representation of a dict ( I don't know if it is related or not ).
I'd like the dump to contains the text with accent, not the unicode code. Is that possible ?