Read sys.maxunicode:
An integer giving the value of the largest Unicode code point, i.e.
1114111
(0x10FFFF
in hexadecimal).
Changed in version 3.3: Before PEP 393, sys.maxunicode
used to
be either 0xFFFF
or 0x10FFFF
, depending on the configuration
option that specified whether Unicode characters were stored as
UCS-2
or UCS-4
.
The following script should work in both Python versions 2 an 3:
# coding=utf-8
from __future__ import print_function
import sys, platform, unicodedata
print( platform.python_version(), 'maxunicode', hex(sys.maxunicode))
tab = '\t'
unistr = u'\u264a \U0001f601' ### unistr = u'♊ '
print ( len(unistr), tab, unistr, tab, repr( unistr))
for char in unistr:
print (len(char), tab, char, tab, repr(char), tab,
unicodedata.category(char), tab, unicodedata.name(char,'private use'))
Output shows consequence of different sys.maxunicode
property value. For instance, the
character (unicode codepoint 0x1f601
above the Basic Multilingual Plane) is converted to corresponding surrogate pair (codepoints u'\ud83d'
and u'\ude01'
) if sys.maxunicode
results to 0xFFFF
:
PS D:\PShell> [System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8
PS D:\PShell> . py -3 D:\test\Python\Py\42783173.py
3.5.1 maxunicode 0x10ffff
3 ♊ '♊ '
1 ♊ '♊' So GEMINI
1 ' ' Zs SPACE
1 '' So GRINNING FACE WITH SMILING EYES
PS D:\PShell> . py -2 D:\test\Python\Py\42783173.py
2.7.12 maxunicode 0xffff
4 ♊ u'\u264a \U0001f601'
1 ♊ u'\u264a' So GEMINI
1 u' ' Zs SPACE
1 �� u'\ud83d' Cs private use
1 �� u'\ude01' Cs private use
Note: above output examples were taken from Unicode-aware Powershell-ISE console pane.