3

A user inputs a string on my website. They input a non-ascii character.

The javascript saves their input, packages it with JSON.stringify(), and sends it to the server.

The server, running Python3, unpacks the JSON with json.loads and saves this string in a Node object, then runs the line

print('looks like {}'.format(node_obj))

I receive the error

'ascii' codec can't encode character error '\u2212' in position 941: ordinal not in range(128)

It seems to me that the print function in Python3 is trying to convert the unicode string to ascii! (convert to bytes object using ascii encoding?)

Is it possible that my FreeBSD server does not support UTF-8, causing Python's print function to make this conversion? Or perhaps the string was never properly sanitized in the first place, and I should be doing that in the javascript when I first receive it from the user?

Let me know what further information is useful to you.

mareoraft
  • 3,474
  • 4
  • 26
  • 62

1 Answers1

5

What does the locale command say?

You can make Python use utf-8 with either LANG=en_US.UTF-8 or PYTHONIOENCODING=utf-8.

Setting LANG in the default environment is platform-dependent: https://unix.stackexchange.com/questions/342817/how-do-i-add-a-language-in-freebsd

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
  • % locale LANG= LC_CTYPE="C" LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= – mareoraft Feb 07 '17 at 22:09
  • I think running `export LANG="en_US.UTF-8"` worked for me. I'm going to wait a little while before accepting an answer because I want to make sure it's correct. – mareoraft Feb 07 '17 at 22:41
  • Yes, setting the `LANG` environment variable to use UTF-8, in whichever way is appropriate for your OS, is the correct solution. – mareoraft Mar 05 '17 at 14:21