5

In my code I keep getting this error...

UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 390: character maps to <undefined>

I tried to put an except for UnicodeError and UnicodeEncodeError but nothing works, the problem is it's the users input so I can't control what they put so I need all encode errors to display a print that says error instead of crashing the program...

try:
    argslistcheck = argslist[0]
    if argslistcheck[0:7] != "http://":
        argslist[0] = "http://" + argslist[0]
    with urllib.request.urlopen(argslist[0]) as url:
        source = url.read()
        source = str(source, "utf8")
    except urllib.error.URLError:
        print("Couln't connect")
        source = ""
    except UnicodeEncodeError:
        print("There was an error encrypting...")
        source = ""

Traceback:

Traceback (most recent call last):
  ..... things leading up to error
  File "C:\path", line 99, in grab print(source)
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 390: character maps to <undefined>
TrevorPeyton
  • 629
  • 3
  • 10
  • 22
  • It is more likely that it is the *output* of your program that is causing the error; Unicode data is automatically encoded to match your terminal output encoding. Are you printing anything or writing to a file? Please include that code and the full traceback. – Martijn Pieters Feb 24 '13 at 21:27
  • Yes, it either prints the source or saves it to a txt file... It only does it on certain sites, like if I do it on my website then it wont, but if I do it with http://test.com/ then it would. I just don't want the program crashing. – TrevorPeyton Feb 24 '13 at 21:32
  • Look closely at the traceback (preferably share it with us). It tells you *what operation* is failing. – Martijn Pieters Feb 24 '13 at 21:34

4 Answers4

6

Give this a try:

source = str(source, encoding='utf-8', errors = 'ignore')

or take a look at this post's question.

Community
  • 1
  • 1
chirinosky
  • 4,438
  • 1
  • 28
  • 39
5

Your print is failing. Your Windows console doesn't support printing UTF-8, you need to change the codepage:

chcp 65001

This is a Windows command, not a python command. You may need to switch fonts too, Lucida Sans Console is a Unicode font that can handle a lot more glyphs.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
3

try this one to replace str(): source = source.encode('UTF-8')

wendong
  • 279
  • 3
  • 10
  • That didn't work, the reason I encode it like that is because I'm changing it from byte to string. If I do that it gives me a different error about being a byte. – TrevorPeyton Feb 24 '13 at 21:46
1
start_url="https://www.indeed.co.in/jobs?q=teacher&l=India"
page_data=requests.get(start_url)
soup=BeautifulSoup(page_data.text,"lxml")
fname='1download'
with open(fname,'w')as f:
    f.write(soup.prettify())
f.close()

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 
235677: character maps to <undefined>

Both errors can be solved with adding utf-8 as encoding to file.also please note you have to use with open( ) method to open files

f=open() will also give you error

Here is the correct code:

 start_url="https://www.indeed.co.in/jobs?q=teacher&l=India"
 page_data=requests.get(start_url)
 soup=BeautifulSoup(page_data.text,"lxml")
 fname='1download'
 with open(fname,'w',encoding="utf-8")as f:
     f.write(soup.prettify())
 f.close()
joel.t.mathew
  • 114
  • 1
  • 15
  • I was missing ' encoding="utf-8" ' in the 'open' function which caused UnicodeEncodeError. After I added it, the error has gone. – Vladimir Jan 16 '21 at 00:53