Python3 changed the unicode behaviour to deny surrogate pairs while python2 not.
There's a question here
But it do not supply a solution on how to remove surrogate pairs in python2 or how to do surrogate escape.
Python3 example:
>>> a = b'\xed\xa0\xbd\xe4\xbd\xa0\xe5\xa5\xbd'
>>> a.decode('utf-8', 'surrogateescape')
'\udced\udca0\udcbd你好'
>>> a.decode('utf-8', 'ignore')
'你好'
The '\xed\xa0\xbd' here is not proper utf-8 chars. And I want to ignore them or escape them.
Is it possible to do the same thing in python2?