0

any idea, if it is possible with regex (python 2.7) to get uniq chars unspitted into surrogate pairs for unicode graphemes?

According This Example this is possible with python 3.x. See here:

>>> import regex
>>> s = '‍‍‍'
>>> for c in regex.findall('\X',s):
...     print(c)
...     
‍‍‍

but for python 2.7 it seems not to work. See example:

>>> import regex
>>> s = '‍‍‍'
>>> for c in regex.findall('\X',s):
...     print(c)
�
�
�
�
�
�
�
�
...

Any ideas how to make this works for python 2.7?=))))

Thx u in advance!!!=)

Egor Savin
  • 39
  • 7
  • I suspect that on a narrow build of Python (which it seems you are using), that only the Basic Multilingual Plane (BMP) is supported. The latest `regex` says it supports Unicode 11.0, but it still splits surrogate characters. It works fine for graphemes in the BMP, e.g. `u'a\u0302'` (decomposed `â`). With Python 2 end-of-life in 2020, best to upgrade. – Mark Tolonen Aug 17 '18 at 16:37
  • thx! yap, i, also thinking to make next step. but python3.x is slowly as python 2.7 and there are not so many third-party-packages as for python2.7. so actually quite important aspects, which stop me everytime, if i think to upgrade my python distribution. – Egor Savin Aug 18 '18 at 09:38

0 Answers0