3

I have a project in Python 2.6 and I'd like to write a utf-8 message to stdout using the system encoding. However it appears that such a function does not exist until Python 3.2:

PySys_FormatStdout

http://docs.python.org/dev/c-api/sys.html

Is there a way to do this from Python 2.6?

To clarify I have a banner that needs to print after Py_Initialize() and before the main interpreter is run. The string is a c-literal containing: "\n and Copyright \xC2\xA9"

where \xC2\xA9 is the utf-8 copyright symbol. I verified in gdb that the copyright symbol is encoded correctly.

Update: I just decided all this grief isn't necessary and I'm going to remove the offending character from the startup banner. There are just too many issues with this, and the documentation is lacking. My expectations were that this would be like Tcl, where:

  1. The embedded interpreter's C-API would make writing stdout out in unicode easy in the system's encoding, and not some default ascii encoding
  2. An exception wouldn't be thrown, if an offending character does not exist in the current encoding. Instead some default replacement character would be displayed.
  3. Additional modules, (e.g. sys), would not be necessary to import just to find out what the system encoding is.
Johan Råde
  • 20,480
  • 21
  • 73
  • 110
Juan
  • 3,667
  • 3
  • 28
  • 32
  • 1
    1. http://bugs.python.org/issue4947 (encode by hand in Python < 2.7) 2. use `errors="replace"` instead of `errors="strict"` if you must 3. `PyUnicode_GetDefaultEncoding()` – jfs Dec 23 '10 at 07:06
  • Thanks J.F., As of now I am just going to avoid using the character in my application's banner. – Juan Dec 23 '10 at 17:34

2 Answers2

2

PyUnicode_DecodeUTF8()

PyObject_Print()

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks, I just need to know how would I get the FILE * associated with any redirections of stdout by the person executing the python interpreter. – Juan Dec 21 '10 at 19:22
  • You either want `stdout` itself, or [`PySys_GetFile("stdout", stdout)`](http://docs.python.org/c-api/sys.html#PySys_GetFile), depending on what you mean by that. – zwol Dec 21 '10 at 19:25
  • I'm not too familiar working with file handles directly, but I just need to make sure that things which get written out go wherever stdout has been redirected to. – Juan Dec 21 '10 at 19:32
  • Unfortunately, the string has all of my carriage returns escaped: u'\n-------------------- and looks like some type of literal that would go into a python script. In addition, the symbol of interest, ©, is written as \xa9, which printed to the screen in my utf-8 environment should be \xc2\xa9 – Juan Dec 21 '10 at 19:45
  • `sys.stdout` could refer to an arbitrary Python object (`PyObject*`) with the `.write()` method, but `PyObject_Print()` requires `FILE*`. – jfs Dec 21 '10 at 21:16
1

You could use PyFile_WriteObject():

f_stdout = PySys_GetObject("stdout");
text = PyUnicode_DecodeUTF8((char*)str, strlen(str), "strict");
PyFile_WriteObject(text, f_stdout, Py_PRINT_RAW);

If you know the final encoding then you could use PyUnicode_AsEncodedString().

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Thanks for your suggestion. The problem I'm getting now is that it is using ASCII instead of the UTF-8 encoding of the system: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 80: ordinal not in range(128) – Juan Dec 21 '10 at 20:42
  • @Juan: what does `sys.getdefaultencoding()` return? – jfs Dec 21 '10 at 20:49
  • 'ascii', but it needs to use the sys.stdout.encoding, 'utf-8' – Juan Dec 21 '10 at 21:49
  • Thanks J.F. But I still need to figure out where to get the system stdout encoding from the C-API without importing the sys module and calling the interpreter to do this. I guess maybe it is safe to assume that the sys module is available for import. – Juan Dec 22 '10 at 03:39
  • Giving this to J.F. as he correctly identified this as a bug. – Juan Dec 24 '10 at 18:22