The problem
I for one delight myself by using Unicode literals in my variable names when writing Python 3 code. Today I had a weird bug, which turns out to be due to Python not distinguishing between the variables ρ
and ϱ
, as this short code show:
ρ = 'hello'
ϱ = 'goodbye'
print(ρ) # Prints 'goodbye'
Is this a bug or a feature? In case of the latter, how/where can I find the set of all such characters which belong together in this manner?
Further exploration
This lack of distinction is not present when ρ
and ϱ
are used inside strings:
a = 'ρ'
b = 'ϱ'
print(a == b) # Prints False
which makes me confident that this is not some encoding problem with my editor/terminal.
We can also confirm that Python is fully aware of precisely which characters we are dealing with, using the unicodedata
module:
import unicodedata
print(unicodedata.name('ρ')) # Prints 'GREEK SMALL LETTER RHO'
print(unicodedata.name('ϱ')) # Prints 'GREEK RHO SYMBOL'
I have found the same behavior between the pair φ
(GREEK SMALL LETTER PHI) and ϕ
(GREEK PHI SYMBOL).