9

I have a list of values and I want to put them in a dictionary that would map each value to it's index.

I can do it this way:

>>> t = (5,6,7)
>>> d = dict(zip(t, range(len(t))))
>>> d
{5: 0, 6: 1, 7: 2}

this is not bad, but I'm looking for something more elegant.

I've come across the following, but it does the opposite of what I need:

>>> d = dict(enumerate(t))
>>> d
{0: 5, 1: 6, 2: 7}

Please share your solutions,
Thank you

EDIT: Python 2.6.4

For lists containing 1000 elements the dict(zip) version is the fastest, the generator and the list comprehension versions are virtually identical and they are ~1.5 times slower and the functional map(reversed) is considerably slower.

$ python -mtimeit -s"t = range(int(1e3))" "d = dict(zip(t, range(len(t))))"
1000 loops, best of 3: 277 usec per loop

$ python -mtimeit -s"t = range(int(1e3))" "d = dict([(y,x) for x,y in enumerate(t)])"
1000 loops, best of 3: 426 usec per loop

$ python -mtimeit -s"t = range(int(1e3))" "d = dict((y,x) for x,y in enumerate(t))"
1000 loops, best of 3: 437 usec per loop

$ python -mtimeit -s"t = range(int(1e3))" "d = dict(map(reversed, enumerate(t)))"
100 loops, best of 3: 3.66 msec per loop

I tried running the same tests for longer and for shorter lists (1e2, 1e4, 1e5) and the time per loop scales linearly with the length of the list.

Could somebody time py 2.7+ version?

AnalyticsBuilder
  • 4,111
  • 4
  • 24
  • 36

6 Answers6

14

You can use a list comprehension (or a generator, depending on your python version) to perform a simple in-place swap for your second example.


Using a list comprehension:

d = dict([(y,x) for x,y in enumerate(t)])

Using a generator expression (Python 2.4 and up):

d = dict((y,x) for x,y in enumerate(t))
kibibu
  • 6,115
  • 1
  • 35
  • 41
  • 3
    You don't need the `[]` in there. `dict` works fine with a generator expression (saves making an intermediate list) – John La Rooy May 14 '10 at 04:51
  • 1
    Yeah, that's why I wrote "depending on your python version". Generators have been around a long time though (since 2.4), so I will include both – kibibu May 14 '10 at 05:32
  • 2
    @J.F. Sebastian In deployed systems developed prior to 2004? Python has been around for quite a while now. It's not hard to imagine having to work on some Python 2.0 application, I mean some people *still* have to work in VB6. – kibibu May 14 '10 at 07:27
14

In Python2.7+ you can write it like this

>>> t = (5,6,7)
>>> d = {x:i for i,x in enumerate(t)}
>>> print d
{5: 0, 6: 1, 7: 2}
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
4
>>> dict((x,i) for i,x in enumerate(t))
{5: 0, 6: 1, 7: 2}
>>>
YOU
  • 120,166
  • 34
  • 186
  • 219
2

Are all your elements unique (i.e. your list would never be 5,6,7,7)? The dict solution will only work if all your elements are unique.

By storing the index, you're essentially duplicating information, since you could simply query the current index of the item in the list. Duplicating information is usually not the best idea, because it allows the possibility for one set of data to get out of sync with the other.

If the list is being modified, there's also nothing preventing you from accidentally assigning the same index to more than one item.

Why are you trying to store the index value, when you could simply get the index from the list?

  • all list elements are unique. I'm storing the index for fast lookup in a different data structure. – AnalyticsBuilder May 14 '10 at 03:49
  • That sounds like a useless level of indirection if all elements are unique, test for membership with `in` and index with `index()`. I'm guessing that you imagine that a hash-map backed dictionary will give you faster lookup than `index()` will. In Python premature optimization truly is evil because your intuitions about "faster" are often wrong until actually timed. Make it work, then find out where you are slow, added complexity isn't worth it. – msw May 14 '10 at 06:25
  • @Dragan, are you modifying your list or does it remain static? – Hamish Grubijan May 14 '10 at 15:03
  • @msw valid concern, in general i agree w/ you, in this case I think it's worth it python -mtimeit -s"t = range(int(1e2))" "truthVal = (55 in t)" # 3.11 usec per loop python -mtimeit -s"t = range(int(1e2)); d = dict(zip(t, range(len(t))))" "truthVal = (55 in d)" #0.138 usec per loop python -mtimeit -s"t = range(int(1e2))" "idxVal = t.index(55)" #3.46 usec per loop python -mtimeit -s"t = range(int(1e2)); d = dict(zip(t, range(len(t))))" "indexVal = d[55]" #0.136 usec per loop – AnalyticsBuilder May 14 '10 at 17:09
  • @dragan: assertion confirmed, even when I biased your test wildly against dictionaries by the use of "5" as a target dicts still blasted the doors off lists by a factor between 3 and 7 times faster. I thought cache hits/misses may have been a confound, it doesn't appear to be for my system even with list sized of 1e7 that forced it out to swap. (Python 2.6.5 Linux 2.6) – msw May 14 '10 at 22:36
2

As everybody has already written, in Python 2.6 I would consider the following as the most pythonic:

>>> dict((x, i) for i, x in enumerate(t))
{5: 0, 6: 1, 7: 2}

Still, in a moment of functional frenzy I would write:

>>> dict(map(reversed, enumerate(t)))
{5: 0, 6: 1, 7: 2}
krawyoti
  • 19,695
  • 1
  • 23
  • 17
0

I like the dict(zip(t, range(len(t)))) best.

Arafangion
  • 11,517
  • 1
  • 40
  • 72