Python 3 UnicodeEncodeError (Apache)

Question

With this code:

#!/usr/bin/env python3
open("We’re-introducing-a-DNS-man.jpg", "wb")

I get the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 2: ordinal not in range(128)

The error only occurs when running the script through Apache as a CGI script. The script runs successfully when running on the command line.

I know I've had many issues with Apache setting the locale stuff incorrectly, so far I've fixed all the previous issues with the below 3 lines of code.

locale.setlocale(locale.LC_ALL, "en_GB.UTF-8")
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
sys.stdin = codecs.getwriter('utf-8')(sys.stdin.detach())

But, I don't know how to fix this new issue, which again seems to be related to the encoding/locale. The only slightly suspicious thing I can find is the result of (this is with the previous lines being added):

locale.getpreferredencoding(True)
ANSI_X3.4-1968

But, if I change the argument to False, I get UTF-8.

How do I fix this encoding issue? Note that I've looked into Apache, and as far as I can tell it should be reporting UTF-8, the fact that it is not is a separate issue and one that I was unable to make any progress on.

Edit:

This is not an issue with the contents/encoding of the file, as the strings are obviously utf-8 in Python 3, and the program is being run without a SyntaxError. All the obvious solutions have been attempted and failed.

The problem is that the open() function appears to be trying to convert the unicode string to ascii. The question is why is it trying to convert to ascii, and how to stop it?

'\u2019' is equal to `’` so it has something to do with filename. — Rahul, Jul 18 '17 at 10:33
and the fact that it's trying to convert to ascii. The question is why is it trying to convert to ascii, and how do I stop it? — Sam Bull, Jul 18 '17 at 10:35
I don't know about cgi but python 3 is `utf-8` by default. You don't need to do anything to make it `utf-8`. — Rahul, Jul 18 '17 at 10:37
It uses utf-8 strings by default, but there are still encoding issues. As mentioned in the question, I've had to add 3 lines of code to fix encoding issues elsewhere. For example, without the stdout/stdin lines, then reading input from a form submission gets garbled as it doesn't interpret stdin as utf-8, and outputting to the browser breaks as it doesn't print in utf-8. It adopts the encoding from the environment locale, and for some reason Apache lies about the locale and tells Python to use the wrong encoding. — Sam Bull, Jul 18 '17 at 10:43

Rahul · Answer 1 · 2017-07-18T10:24:52.117

-1

open("We’re-introducing-a-DNS-man.jpg", "wb")

change it to

open("We're-introducing-a-DNS-man.jpg", "wb")

You don't need ’ (RIGHT SINGLE QUOTATION MARK). use ' (QUOTE) instead.

If your file names are dynamically generated, you need to replace the same before opening it as file.

edited Jul 18 '17 at 10:24

answered Jul 18 '17 at 10:01

Rahul

10,830
4
53
88

Thanks, but that is just me simplifying the problem into one line. My actual code is fetching files from a remote SFTP server, so the filenames are dynamic, not hard-coded. I need the encoding issue fixed, not the dumb filenames. – Sam Bull Jul 18 '17 at 10:04
@SamBull I don't thing `open('filename.jpg", "wb")` will give unicode error. you need to add more detailed code so that someone can help. – Rahul Jul 18 '17 at 10:12
I have reproduced the problem with exactly that line of code and only that line of code. With a Python script only containing that line, I get the decode error when it is run from Apache as a CGI script. – Sam Bull Jul 18 '17 at 10:18
Then the problem is with cgi script because `"wb"` mode will write bytes and not string. – Rahul Jul 18 '17 at 10:19
It's not writing anything. The decode error is happening on the filename, before it's managed to successfully open the file. – Sam Bull Jul 18 '17 at 10:20
For your information, '\u2019' is for `right single quotation mark` which your filename is all about. – Rahul Jul 18 '17 at 10:22

ewwink · Answer 2 · 2017-07-18T10:59:07.523

-1

make sure your .py file has encoding utf-8, in notepad++ you can convert it using Encoding -> Convert to UTF-8 then at the top of your .py file add

# -*- coding: utf-8 -*-
fl = open(u"We’re-introducing-a-DNS-man.txt", "r")
print(fl.read())

try also adding u"filename" like above

that how i fix error with utf8 character encoding

demo

edited Jul 18 '17 at 10:59

answered Jul 18 '17 at 10:26

ewwink

18,382
2
44
54

Wasn't me. But, I've already tried that, and no luck (I think that's more a Python 2 thing). It's not the encoding of the source file that is the issue (which should be UTF-8 in Python 3 by default), but something about the encoding in the running environment. I know all the obvious mistakes with unicode, this is something more difficult. – Sam Bull Jul 18 '17 at 10:33
Well, it's still the wrong the answer. And not something that should affect Python 3. Your image evens shows a SyntaxError due to the encoding, not a UnicodeEncodeError. So, not related to the question. – Sam Bull Jul 18 '17 at 10:49
SyntaxError occurs because there is something wrong with your file. UnicodeEncodeError is something that occurs at runtime while trying to convert a string, and thus is unrelated to the file itself. – Sam Bull Jul 18 '17 at 10:59
@SamBull have you tried adding `u` like `open(u"We’re-introducing-a-DNS-man.jpg", "wb")` – ewwink Jul 18 '17 at 11:00
Same problem. Strings are Unicode by default (Python 3). The problem is not the string itself, but the fact that it trying to convert the unicode string to ascii in the open() function. – Sam Bull Jul 18 '17 at 11:03

Python 3 UnicodeEncodeError (Apache)

Edit:

2 Answers2