How to get the length of a unicode string?

Asked Jun 07 '16 at 15:23

Active Jun 07 '16 at 15:28

Viewed 47 times

If I do the following:

ustr = unicode()
ustr = '青皮'    
print len(ustr)

I get an output of 6. But that's the number of bytes.

How do I get an output of 2? (i.e. the actual number of unicode code points)

edited Jun 07 '16 at 15:28

asked Jun 07 '16 at 15:23

patchwork

@Bhargav Rao - i don't think either of those 2 answers given to that question, will actually answer mine. – patchwork Jun 07 '16 at 15:27
The dupe perfectly answers your post. Try `len(__import__('unicodedata').normalize('NFC',u'青皮'))`. – Bhargav Rao Jun 07 '16 at 15:42
the answer seems to be to create the unicode object as: `ustr = unicode(<"青皮">, 'utf-8'). And then `len(ustr)` gives me the number of code points. – patchwork Jun 07 '16 at 15:53

0 Answers0