A help needed with a pretty simple Python 3.6 script.
First, it downloads an HTML file from an old-fashioned server which uses cp1251 encoding.
Then I need to put the file contents into a UTF-8 encoded string.
Here is what I'm doing:
import requests
import codecs
#getting the file
ri = requests.get('http://old.moluch.ru/_python_test/0.html')
#checking that it's in cp1251
print(ri.encoding)
#encoding using cp1251
text = ri.text
text = codecs.encode(text,'cp1251')
#decoding using utf-8 - ERROR HERE!
text = codecs.decode(text,'utf-8')
print(text)
Here is the error:
Traceback (most recent call last):
File "main.py", line 15, in <module>
text = codecs.decode(text,'utf-8')
File "/var/lang/lib/python3.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 43: invalid continuation byte
I'd really appreciate any help with it.