8
import locale
prefered_encoding = locale.getpreferredencoding()
prefered_encoding 'ANSI_X3.4-1968'

I'm using a framework called inginious and it's using web.py to render its template.

web.template.render(os.path.join(root_path, dir_path),
                                   globals=self._template_globals,
                                   base=layout_path)

The rendering works on my localhost but not on my staging server.

They both run python3. I see that web.py enforces utf-8 on

the encoding in Python2 only (that's out of my hands)

def __str__(self):
    self._prepare_body()
    if PY2:
        return self["__body__"].encode('utf-8')
    else:
        return self["__body__"]

here is the stack trace

t = self._template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1028, in _template,
self._cache[name] = self._load_template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1016, in _load_template
return Template(open(path).read(), filename=path, **self._keywords)
File "/lib64/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 83: ordinal not in range(128),

My html do include hebew chars, small example

<div class="modal-content">
                    <div class="modal-header">
                        <button type="button" class="close" data-dismiss="modal">&times;</button>
                        <h4 class="modal-title feedback-modal-title">
                            חישוב האיברים הראשונים בסדרה של איבר ראשון חיובי ויחס שלילי:
                            <span class="red-text">אי הצלחה</span>

and I open it like so :

open('/path/to/feedback.html').read()

and the line where the encoding fails is where the Hebrew chars are.

I tried setting some environment variables in ~/.bashrc:

export PYTHONIOENCODING=utf8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

under the user centos

The ingenious framework is installed as a pip under python3.5 site-packages. and it served by an apache server under the user apache

Tried setting the environment variables in the code (during the init of the app) so that the apache WSGI will be aware of them

import os 
os.environ['LC_ALL'] = 'en_US.UTF-8'
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LANGUAGE'] = 'en_US.UTF-8'

I have edited the /etc/httpd/conf/httpd.conf using the setenv method:

SetEnv LC_ALL en_US.UTF-8
SetEnv LANG en_US.UTF-8
SetEnv LANGUAGE en_US.UTF-8
SetEnv PYTHONIOENCODING utf8

and restarted using sudo service httpd restart and still no luck.

My question is, what is the best practice to solve this. I understand there are hacks for this, but I want to understand what is the underline cause as well as how to solve it.

Thanks!

WebQube
  • 8,510
  • 12
  • 51
  • 93
  • `ANSI_X3.4-1968` == `ASCII`. – Martijn Pieters Oct 12 '17 at 06:55
  • You'll need to show us exactly what the traceback is and how to reproduce it. – Martijn Pieters Oct 12 '17 at 06:56
  • I've added the stacktrace and some more code, but In order to reproduce, you'll have to install the inginious framework, which is not a valid suggestion, so my best option is to describe the question well – WebQube Oct 12 '17 at 07:09
  • Bah, web.py doesn't handle reading a template file all that well, it should really be explicit about their encoding. That's rather dumb, to be frank. You could work around this by using HTML entities for your non-ASCII text I suppose, but my personal recommendation is to move away from `web.py` and move to Flask or Django instead (with template handling that is far more battle-hardened on real Python 3 deployments). – Martijn Pieters Oct 12 '17 at 07:19
  • @MartijnPieters, that is not relevant to my scope. I have explored solutions in the scope of environment variables without any luck. The face is, that it works on my localhost. pls remove the hold from the question, I think it's a very valid question. – WebQube Oct 12 '17 at 07:51
  • Thanks for adding the extra info. I suspect that Apache (mod_wsgi I presume?) will have to be taught about your env vars. Or create a wrapper script for your WSGI setup: https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c – Martijn Pieters Oct 12 '17 at 07:55
  • tried it now, didn't work. edited the question. see "Tried setting the environment variables". is it the right location or should have I placed that in the apache settings file somewhere. if yes, I don't know where the default location for that file. – WebQube Oct 12 '17 at 08:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/156533/discussion-between-webqube-and-martijn-pieters). – WebQube Oct 12 '17 at 08:23
  • @WebQube Did you happen to solve this issue??? – Ritesh Chitrakar Feb 27 '21 at 09:08
  • The solution is listed in answers – WebQube Feb 27 '21 at 10:32
  • This is what worked for me: https://stackoverflow.com/questions/30634777/unicodeencodeerror-when-running-in-mod-wsgi-even-after-setting-lang-and-lc-all – Luke Dupin Jun 17 '22 at 04:47

2 Answers2

1

finally found the answer when reading the file changed from

open('/path/to/feedback.html').read()

to

import codecs
with codecs.open(file_path,'r',encoding='utf8') as f:
     text = f.read()

if anyone has a more general approach that will work, I'll accept his answer

WebQube
  • 8,510
  • 12
  • 51
  • 93
  • Well, not exactly. In python3, "In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding." In other words, what you did above will work, but the real problem is that `locale.getpreferredencoding(False)` isn't returning utf-8 like you want it to. This is [from the python3 docs](https://docs.python.org/3/library/functions.html#open) – mlissner Oct 28 '20 at 00:06
1

A Python 2+3 solution would be:

import io

with io.open(file_path, mode='r', encoding='utf8') as f:
     text = f.read()

See the documentation of io.open.

Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103