4

The problem

I for one delight myself by using Unicode literals in my variable names when writing Python 3 code. Today I had a weird bug, which turns out to be due to Python not distinguishing between the variables ρ and ϱ, as this short code show:

ρ = 'hello'
ϱ = 'goodbye'
print(ρ)  # Prints 'goodbye'

Is this a bug or a feature? In case of the latter, how/where can I find the set of all such characters which belong together in this manner?

Further exploration

This lack of distinction is not present when ρ and ϱ are used inside strings:

a = 'ρ'
b = 'ϱ'
print(a == b)  # Prints False

which makes me confident that this is not some encoding problem with my editor/terminal.

We can also confirm that Python is fully aware of precisely which characters we are dealing with, using the unicodedata module:

import unicodedata
print(unicodedata.name('ρ'))  # Prints 'GREEK SMALL LETTER RHO'
print(unicodedata.name('ϱ'))  # Prints 'GREEK RHO SYMBOL'

I have found the same behavior between the pair φ (GREEK SMALL LETTER PHI) and ϕ (GREEK PHI SYMBOL).

jmd_dk
  • 12,125
  • 9
  • 63
  • 94

1 Answers1

6

2.3. Identifiers and keywords ¶

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

>>> unicodedata.normalize('NFKC', 'ρϱ')
'ρρ'
Josh Lee
  • 171,072
  • 38
  • 269
  • 275