With this code:
#!/usr/bin/env python3
open("We’re-introducing-a-DNS-man.jpg", "wb")
I get the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 2: ordinal not in range(128)
The error only occurs when running the script through Apache as a CGI script. The script runs successfully when running on the command line.
I know I've had many issues with Apache setting the locale stuff incorrectly, so far I've fixed all the previous issues with the below 3 lines of code.
locale.setlocale(locale.LC_ALL, "en_GB.UTF-8")
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
sys.stdin = codecs.getwriter('utf-8')(sys.stdin.detach())
But, I don't know how to fix this new issue, which again seems to be related to the encoding/locale. The only slightly suspicious thing I can find is the result of (this is with the previous lines being added):
locale.getpreferredencoding(True)
ANSI_X3.4-1968
But, if I change the argument to False
, I get UTF-8
.
How do I fix this encoding issue? Note that I've looked into Apache, and as far as I can tell it should be reporting UTF-8, the fact that it is not is a separate issue and one that I was unable to make any progress on.
Edit:
This is not an issue with the contents/encoding of the file, as the strings are obviously utf-8 in Python 3, and the program is being run without a SyntaxError. All the obvious solutions have been attempted and failed.
The problem is that the open() function appears to be trying to convert the unicode string to ascii. The question is why is it trying to convert to ascii, and how to stop it?