UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
I have a pandas dataframe, which I import as latin-1, I get a specific column, which contains a url, use re.findall to get a hex code from the url. I remove the 0x part and I get a correct hex code. However upon trying bytes.fromhex(hex).decode('utf-8'), I get a continuation byte error.
import re
import pandas as pd
import codecs
import binascii
df = pd.read_csv(file, encoding='latin-1', low_memory=False)
urls = df['g_maps_claimed']
def hex_to_string(hex):
if hex[:3] == ':0x':
hex = hex[3:].lower()
print("Corrected1:",hex)
elif hex[:2] == '0x':
hex = hex[2:].lower()
print("Corrected2:",hex)
print(len(hex))
# hex = hex.encode('utf-8').decode('latin-1')
# string_value = codecs.decode(hex, 'hex').decode('utf-8')
ascii_data = binascii.unhexlify(hex).decode('utf-8') #Takes one line from line and converts it to ASCII
print(ascii_data) #Prints the ascii on screen
# string_value = bytes.fromhex(hex).decode('utf-8') #<--ERROR!
# print("String value:",string_value)
# return string_value
for url in urls:
try:
hexadecimal_id = re.findall(':0x[A-Z0-9]*', url)[0]
except:
try:
hexadecimal_id = ''
except TypeError as error:
print(error, url)
print("Hexadecimal_id:",hexadecimal_id)
hex_to_string(hexadecimal_id)
# ascii_data = binascii.unhexlify(hex) #Takes one line from line and converts it to ASCII
# print ascii_data #Prints the ascii on screen
I've tried using both latin-1 encoding and ISO-8859-1 both producing the same error. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte.
Example of what I get: Hexadecimal_id: :0xE91DF6E4F947252C Corrected1: e91df6e4f947252c It has a string class.
I tried looking over other answers, but didn't find anything that would work for me. Any help would be appreciated!