I have some strings that I am pasting in to my script as test data. The strings come from emails that contain encoded characters and it's throwing a SyntaxError
. So far, I have not been able to find a solution to this issue. When I print repr(string)
, I get these strings:
'Total Value for 1st Load \xe2\x80\x93 approx. $75,200\n'
'Total Value for 2nd Load \xe2\x80\x93 approx. $74,300\n'
And this error pops up when I run my script:
SyntaxError: Non-ASCII character '\xe2' in file <filename> on line <line number>, but no
encoding declared; see http://www.python.org/peps/pep-2063.html
When I just print the lines containing the encoded characters I get this:
'Total Value for 2nd Load – approx. $74,300'
The data looks like this when I copy it from the email:
'Total Value for 1st Load – approx. $75,200'
'Total Value for 2nd Load – approx. $74,300'
From doing my searches, I believe it's encoded with utf-8, but I have no idea how to work with this data based on the fact that some characters are encoded, but most of them are not(maybe?). I have tried varying "solutions" I have found so far. Including adding # -*- coding: utf-8 -*-
to the top of my script and the script just hangs... It doesn't do anything :(
If someone could provide some information on how to deal with this scenario, that would be amazing.
I have tried decoding and encoding using string.encode()
and string.decode()
, using different encoding based on what I could find on Google, but that hasn't solved the problem.
I would really prefer a python solution because the project I'm working on requires people to paste data into a textfield in a GUI, and then that data will be processed. Other solutions suggested pasting the data into something like word, or notepad, saving it as plain text, then doing another copy/paste back from that file. This is a bit much. Does anybody know of a pythonic way of dealing with this issue?