In the Java standard library, Character.getName(0x2000A)
returns "CJK UNIFIED IDEOGRAPHS EXTENSION B 2000A"
(in java 11, 16 and 17, using unicode version 10 and unicode version 13), while I expected "CJK UNIFIED IDEOGRAPHS-2000A"
The result surprised to me, because the codepoint is part of a character group with name "CJK UNIFIED IDEOGRAPHS-#"
, and such blocks usually derive their name from the group name with the # replaced with the codepoint number. This is the case, for example, with codepoint u+FA21
, which returns name "CJK COMPATIBILITY IDEOGRAPH-FA21"
.
This rule is explained in Unicode® Standard Annex #42 paragraph 4.4.2
if a code point has the attribute na (either directly or by inheritence from an enclosing group), then occurrences of the character # in the name are to be interpreted as the value of the code point.
It seems the character is named through the "category rule" of the JDK, where the name of the character is given by taking the name of the block of the codepoint in upper case, and appending the codepoint.
Why does the jdk return the block name for codepoint u+2000A
treating it apparently differently than u+FA21