3

I notice that when normalizing a Unicode string to NFKC form, superscript characters like ¹ (U+00B9), ² (U+00B2), ³ (U+00B3), etc are converted to the corresponding ASCII digit (ex. 1, 2, 3, etc).

Does anyone know the rationale for this behavior? It seems like it's losing information in the process. For example, a superscript number usually has some contextual meaning.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
codesniffer
  • 1,033
  • 9
  • 22
  • 1
    The "K" apparently stands for "compatibility" (well... I guess the "C" was already used for "canonical"). [Wikipedia](https://en.wikipedia.org/wiki/Unicode_equivalence) says: "Compatible sequences may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others." – lenz Apr 27 '18 at 09:16

0 Answers0