I'm trying to read what is supposed to be a cp1252 file according to Sublime Text3 and I'm getting the UnicodeEncodeError.
with codecs.open(config_path, mode='rb', encoding='cp1252') as f:
lines = f.readlines()
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 15: character maps to <undefined>
I can read the file if I change the encoding to latin-1 which is a bit weird...I'm fairly new to encode/decode stuff and if I open the file in notepad++/ST3/excel it is just an incomprehensible list of what it's look like to be binary data to me.
with codecs.open(config_path, mode='r', encoding='latin-1') as f:
lines = f.readlines()
for l in lines:
utf_line = l.encode("utf-8")
print(utf_line)
b"\x00\x03'\xc2\x9a\x00\x03'\xc2\x9a\x00\x03&\xc3\xba\x00\x03'\xc3\x9a\x00\x03'?\x00\x03'\xc2\xbd\x00\x03't\x00\x03'\xc2\xb2\x00\x03'\xc3\xac\x00\x03'\xc3\x9b\x00\x03'1\x00\x03'\xc2\x98\x00\x03'M\x00\x03'o\x00\x03'\xc3\x8b\x00\x03'\xc2\xbf\x00\x03'd\x00\x03'\xc2\xbf\x00\x03'\xc3\xb0\x00\x03'1\x00\x03'\xc2\x9f\x00\x03'\xc2\x9f\x00\x03'V\x00\x03'\xc2\xa0\x00\x03'G\x00\x03'\x15\x00\x03'u\x00\x03'\xc2\xae\x00\x03'`\x00\x03'|\x00\x03'\x17\x00\x03'Q\x00\x03'8\x00\x03'\xc2\x94\x00\x03':\x00\x03'4\x00\x03'P\x00\x03'\xc2\x9d\x00\x03'\xc2\x9f\x00\x03''\x00\x03'\xc3\x92\x00\x03't\x00\x03'\xc3\xb3\x00\x03'l\x00\x03'c\x00\x03'2\x00\x03'i\x00\x03'C\x00\x03'=\x00\x03'\x0f\x00\x03'\xc3\x89\x00\x03'\xc3\x8a\x00\x03'\xc2\xb7\x00\x03'`\x00\x03'T\x00\x03'\xc2\x90\x00\x03'\xc3\x9b\x00\x03'\xc2\x90\x00\x03'y\x00\x03'?\x00\x03'\xc2\x92\x00\x03'\xc3\xad\x00\x03'g\x00\x03'\xc2\x84\x00\x03'@\x00\x03'\xc2\xa9\x00\x03'q\x00\x03'L\x00\x03'\xc2\xae\x00\x03'
Here is the file
As suggested I've tried to use chardet as follow:
with open(config_path, mode='rb') as f:
lines = f.read()
encoding = chardet.detect(lines)
print(encoding)
{'encoding': None, 'confidence': 0.0, 'language': None}
If I'm testing each line I'm getting a bunch of encoding: cp1252, cp1253, ascii...
Thank you