Python: Enumerate a list of string 'keys' into ints

Question

I searched for a while but didn't find anything that explained exactly what I'm trying to do.

Basically I have a list of string "labels", e.g. ["brown", "black", "blue", "brown", "brown", "black"] etc. What I want to do is convert this into a list of integers where each label corresponds to an integer, so

["brown", "black", "blue", "brown", "brown", "black"]

becomes

[1, 2, 3, 1, 1, 2]

I looked into the enumerate function but when I gave it my list of strings (which is quite long), it assigned an int to each individual label, instead of giving the same label the same int:

[(1,"brown"),(2,"black"),(3,"blue"),(4,"brown"),(5,"brown"),(6,"black")]

I know how I could do this with a long and cumbersome for loop and if-else checks, but really I'm curious if there's a more elegant way to do this in only one or two lines.

Martijn Pieters · Accepted Answer · 2013-06-17T16:47:44.490

You have non-unique labels; you can use a defaultdict to generate numbers on first access, combined with a counter:

from collections import defaultdict
from itertools import count
from functools import partial

label_to_number = defaultdict(partial(next, count(1)))
[(label_to_number[label], label) for label in labels]

This generates a count in order of the labels first occurrence in labels.

Demo:

>>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
>>> label_to_number = defaultdict(partial(next, count(1)))
>>> [(label_to_number[label], label) for label in labels]
[(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]

Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels list.

Alternatively, use a set() to get unique values, then map these to a enumerate() count:

label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
[(label_to_number[label], label) for label in labels]

This assigns numbers more arbitrarily, as set() objects are not ordered:

>>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
>>> [(label_to_number[label], label) for label in labels]
[(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]

This requires looping through labels twice though.

Neither approach requires you to first define a dictionary of labels; the mapping is created automatically.

score 3 · Answer 2 · answered Jun 17 '13 at 16:36

3

You could first create a dictionary like:

dict = {"brown":1 , "black": 2, "blue": 3 }

And then:

li = ["brown", "black", "blue", "brown", "brown", "black"]
[dict[i] for i in li]

answered Jun 17 '13 at 16:36

Ankur Ankan

2,953
2
23
38

score 1 · Answer 3 · answered Jun 17 '13 at 16:37

Try this:

lst = ["brown", "black", "blue", "brown", "brown", "black"]
d = {"brown":1, "black":2, "blue":3}

[d[k] for k in lst]
=> [1, 2, 3, 1, 1, 2]

Of course, for this to work you have to define the equivalences somewhere - above, I used a dictionary for it. Otherwise, there's no way to know that the color brown corresponds to the number 1, etc.

score 0 · Answer 4 · answered Jun 17 '13 at 16:42

0

The simplest piece of code that reproduces your requested answer is:

l = ["brown", "black", "blue", "brown", "brown", "black"]
i = [l.index(x)+1 for x in l]
print i

>>> [1, 2, 3, 1, 1, 2]

For a long list, this could get quite slow, but it generates exactly what you asked for, with no preparation of any sort.

answered Jun 17 '13 at 16:42

Simon Callan

3,020
1
23
34

If the list of labels is large, this will perform terribly badly, as `.index()` has to scan through the list for each loop iteration. – Martijn Pieters Jun 17 '13 at 16:44
This what what I was talking about, when I said it could get slow for a long list, but it deepnds on how big the list is. – Simon Callan Jun 17 '13 at 16:45
This also assumes something not explicit in the question regarding the integer assignment. A second example such as `l = ["brown", "black", "brown", "blue", "brown", "black"]` would assign 4 for "blue" whereas the dictionary approach would assign 3 for "blue" in both cases. – dansalmo Jun 17 '13 at 17:22

Python: Enumerate a list of string 'keys' into ints

4 Answers4