0

When I run this Python 2.7 code (edit: updated the code)

import io
x = io.StringIO(u'\ud801')

CPython runs fine, but IronPython throws the following error:

UnicodeEncodeError:
Unable to translate Unicode character \uD801 at index 0 to specified code page.

I presume this is because U+D801 is an unpaired surrogate and thus an invalid character, but which implementation is displaying correct behavior here? Should this code throw or not throw?

user541686
  • 205,094
  • 128
  • 528
  • 886

1 Answers1

0

They are both correct, but aren't doing the same thing. IronPython appears to be trying to print the Unicode character, and fails to convert it to the current code page. You get the same behavior with Python 2.7 if you print the character:

>>> import io
>>> io.StringIO(u'\ud801').getvalue()
u'\ud801'
>>> print(io.StringIO(u'\ud801').getvalue())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ud801' in position 0: character maps to <undefined>
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Unfortunately this is incorrect. Even `io.StringIO(u'\ud801') and None` gives an error in IronPython. It doesn't seem to have anything to do with printing. – user541686 Apr 04 '19 at 05:28
  • `io.StringIO(u'\ud801').getvalue().encode('unicode-escape')` would be another way to test safely. But it seems like the error is on string literal creation. – o11c Apr 04 '19 at 05:29
  • @mehrdad It is correct. Iron Python is trying to print the actual Unicode character and failing. That is not what Python 2.7 is doing. It is just displaying the ASCII representation of the escape code and there are no invalid characters being encoded to the terminal. The fact that you get a UnicodeEncodeError indicates IronPython is doing something different. What IDE are you using, and is it IronPython 2.7 or something else? – Mark Tolonen Apr 04 '19 at 05:31
  • @MarkTolonen: No, it is most definitely not related to printing. https://i.stack.imgur.com/C730F.png – user541686 Apr 04 '19 at 05:38
  • `print` was a bad choice of words but the PNG makes it clear. *encode* would be better word choice. Python 2.7 is correct. It looks like IronPython is trying to convert the Unicode string to a byte string internally and there is no reason for that. Printing does the same conversion which is why it results in the same error. Edit your question and paste the text of your PNG example. It better explains the behavior you are seeing. – Mark Tolonen Apr 04 '19 at 05:50
  • @mehrdad What does IronPython do with a simple `u'\ud801'`? – Mark Tolonen Apr 04 '19 at 05:52