1
'\u00BD' # ½
'\u00B2' # ²

I am trying to understand isdecimal() and isdigit() better, for this its necessary to understand unicode numeric value properties. How would I see the numerical value property of, for example, the above two unicodes.

Phoenix
  • 4,386
  • 10
  • 40
  • 55

2 Answers2

6

To get the 'numeric value' contained in the character, you could use unicodedata.numeric() function:

>>> import unicodedata
>>> unicodedata.numeric('\u00BD')
0.5

Use the ord() function to get the integer codepoint, optionally in combination with format() to produce a hexadecimal value:

>>> ord('\u00BD')
189
>>> format(ord('\u00BD'), '04x')
'00bd'

You can get access to the character property with unicodedata.category(), which you'd then need to check against the documented categories:

>>> unicodedata('\u00DB')
'No'

where 'No' stands for Number, Other.

However, there are a series of .isnumeric() == True characters in the category Lo; the Python unicodedata database only gives you access to the general category and relies on str.isdigit(), str.isnumeric(), and unicodedata.digit(), unicodedata.numeric(), etc. methods to handle the additional categories.

If you want a precise list of all numeric Unicode characters, the canonical source is the Unicode database; a series of text files that define the whole of the standard. The DerivedNumericTypes.txt file (v. 6.3.0) gives you a 'view' on that database specific the numeric properties; it tells you at the top how the file is derived from other data files in the standard. Ditto for the DerivedNumericValues.txt file, listing the exact numeric value per codepoint.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    I think OP wants 0.5 and 2 for those code points, not their code point. –  Apr 01 '14 at 15:19
  • @delnan: check, added that too. – Martijn Pieters Apr 01 '14 at 15:20
  • 1
    my question may be wrong then - I read about the property values Numeric_Type=Digit, Numeric_Type=Decimal, and Numeric_Type=Numeric I was wondering whether I could produce this property from a unicode point somehow? – Phoenix Apr 01 '14 at 15:55
  • 1
    `unicodedata.category('\u00DB') == 'Lu'`, not `No` (it would be true for '\u00BD'). `format(ord('\u00BD'), '04x')` seems unrelated to the question – jfs Apr 01 '14 at 16:33
1

the docs explicitly specify the relation between the methods and Numeric_Type property.

def is_decimal(c):
    """Whether input character is Numeric_Type=decimal."""
    return c.isdecimal() # it means General Category=Decimal Number in Python

def is_digit(c):
    """Whether input character is Numeric_Type=digit."""
    return c.isdigit() and not c.isdecimal()


def is_numeric(c):
    """Whether input character is Numeric_Type=numeric."""
    return c.isnumeric() and not c.isdigit() and not c.isdecimal()

Example:

>>> for c in '\u00BD\u00B2':
...     print("{}: Numeric: {}, Digit: {}, Decimal: {}".format(
...         c, is_numeric(c), is_digit(c), is_decimal(c)))
... 
½: Numeric: True, Digit: False, Decimal: False
²: Numeric: False, Digit: True, Decimal: False

I'm not sure Decimal Number and Numeric_Type=Decimal will always be identical.

Note: '\u00B2' is not decimal because superscripts are explicitly excluded by the standard, see 4.6 Numerical Value (Unicode 6.2).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Neither of the two characters you give is `Decimal`. Can you come up with a third example? – Eric Apr 05 '16 at 16:08
  • 1
    @Eric here are [*all* decimal numbers (in the Unicode standard used by python executable)](http://ideone.com/JIaSpQ) – jfs Apr 05 '16 at 16:18
  • I think I'm confused by how your `is_digit('0')` is `False` – Eric Apr 05 '16 at 16:24
  • @Eric `'0'` has a property `Numeric_Type=decimal` (decimal digit). `is_digit(c)` returns whether `Numeric_Type=digit` (decimal, but in typographic context e.g., `①`) —they are mutually exclusive. What characters have which `Numeric_Type` is defined in the Unicode standard. – jfs Apr 05 '16 at 16:47