I have a UTF8 encoded text that has been mangled and contains some 'cp1252' ASCII literals. I am trying to isolate the literals and convert them one by one, however following code does not work and I can't understand why...
text = "This text contains some ASCII literal codes like \x9a and \x9e."
# Find all ASCII literal codes in the text
codes = re.findall(r'\\x[0-9a-fA-F]{2}', text)
# Replace each ASCII literal code with its decoded character
for code in codes:
char = bytes(code, 'ascii').decode('cp1252')
text = text.replace(code, char)
print(text)