Surprising behavior with unicode dict keys

Asked May 02 '18 at 02:17

Active May 02 '18 at 02:17

Viewed 50 times

Consider the following piece of code:

In [1]: a = {'ϵ': 1}

In [2]: b = dict(ϵ=1)

In [3]: a == b
Out[3]: False

In [4]: print(a, b)
{'ϵ': 1} {'ε': 1}

I was surprised to find out that a is not equal to b. It appears that the resulting dicts use distinct Unicode symbols for epsilon, despite having similar definitions (I type \epsilon + tab in my IPython environment).

I wonder why this happens and if there is a preferred way to handle Unicode keys in this situation.

asked May 02 '18 at 02:17

hilberts_drinking_problem

11,322
3
22
51

1

`dict(ϵ=1)` gives me `SyntaxError`. – Sohaib Farooqi May 02 '18 at 02:30
@bro-grammer I guess I should mention that I use Python 3. I do not get a syntax error. – hilberts_drinking_problem May 02 '18 at 02:35
I am also using python 3. `dict([('ϵ',1)])` works for me. – Sohaib Farooqi May 02 '18 at 02:35
Strange. I guess the identifier issue covers my question, though I am not sure why there is a difference in SyntaxError behavior. – hilberts_drinking_problem May 02 '18 at 02:38
In the form `dict(ϵ=1)`, `ϵ` is an identifier so it is normalized to NFKC. In other words it is equivalent to `dict(ε=1)`. On the other hand, in `dict([('ϵ',1)])` the `'ϵ'` is a string. – roeland May 02 '18 at 03:20
1

Regarding the syntax error that is probably an artefact of characters getting mangled. If I type this in on a Windows console, the limitations of that console turn it into `dict(?=1)`. – roeland May 02 '18 at 03:21
2

@bro-grammer What version of 3.x are you using? If it's pre-3.6, is your locale non-UTF-8? (Or are you on Windows?) And did you test in a script, or a REPL? And, if a REPL, the standard on or iPython or otherwise, and in the terminal/console or something like IDLE? Because it should do exactly what the OP is seeing, but on Windows, or pre-3.6 on many Linux systems, etc., it may not. – abarnert May 02 '18 at 03:21
(Well, "should" from a certain point of view. The Unicode standard, like some Jedi Masters, can be a bit inscrutable.) – abarnert May 02 '18 at 03:25
1

@abarnert yes it turns out locale is non-UTF-8. Thanks for pointing out it was very helpful! – Sohaib Farooqi May 02 '18 at 05:41

Surprising behavior with unicode dict keys

0 Answers0